SUMSEM2022-23 BMT5113 TH VL2022230700419 2023-06-24 Reference-Material-I

Data Analysis for Managers
BMT5113
INSTITUTE-
VIT Business School
Module:4 Test of Hypothesis & Non Faculty Name: Dr. Bijay Kushwaha
Parametric Test Dr. Bijay Kushwaha, VIT BS, Vellore Institute of
Technology, Vellore, India
STUDENT LEARNING OUTCOMES (SLO)
➢ Reach a statistical conclusion in hypothesis testing problems about a
population mean with a known population standard deviation using
the z statistic.
➢ Reach a statistical conclusion in hypothesis testing problems about a

population mean with an unknown population standard deviation
using the t statistic.
Dr. Bijay Kushwaha, VIT BS, Vellore Institute of Technology, Vellore, India
Test of Hypothesis
• Hypotheses are tentative explanations of a principle

operating in nature.
• research hypothesis is a statement of what the researcher

believes will be the outcome of an experiment or a study.
Before studies are undertaken, business researchers often
have some idea or theory based on experience or previous
work as to how the study will turn out.
• Hypothesis testing is a technique to help determine whether

a specific treatment has an effect on the individuals in a
population.
Test of Hypothesis
• The purpose of the hypothesis test is to decide between two

explanations:
1. The difference between the sample and the population can
be explained by sampling error (there does not appear to be a
treatment effect)
2. The difference between the sample and the population is too
large to be explained by sampling error (there does appear to be a
treatment effect).
Test of Hypothesis
• The null hypothesis states that the “null” condition exists; that is,
there is nothing new happening, the old theory is still true, the
old standard is correct, and the system is in control.
• The alternative hypothesis, on the other hand, states that the

new theory is true, there are new standards, the system is out of
control, and/or something is happening.
As an example, suppose flour packaged by a manufacturer is sold by weight; and a

particular size of package is supposed to average 40 ounces. Suppose the
manufacturer wants to test to determine whether their packaging process is out
of control as determined by the weight of the flour packages. The null hypothesis
for this experiment is that the average weight of the flour packages is 40 ounces
(no problem). The alternative hypothesis is that the average is not 40 ounces
(process is out of control).
Test of Hypothesis
Test of Hypothesis
State the hypotheses and select an α level. The null hypothesis, H0,
always states that the treatment has no effect (no change, no
difference). According to the null hypothesis, the population mean after
treatment is the same is it was before treatment. The α level
establishes a criterion, or "cut-off", for making a decision about the null
hypothesis. The alpha level also determines the risk of a Type I error.
Test of Hypothesis
Locate the critical region. The critical region consists of outcomes that are
very unlikely to occur if the null hypothesis is true. That is, the critical
region is defined by sample means that are almost impossible to obtain if
the treatment has no effect. The phrase “almost impossible” means that
these samples have a probability (p) that is less than the alpha level.
Test of Hypothesis
Compute the test statistic. The test statistic forms a ratio comparing the
obtained difference between the sample mean and the hypothesized
population mean versus the amount of difference we would expect without
any treatment effect (the standard error).
A large value for the test statistic shows that the obtained mean difference
is more than would be expected if there is no treatment effect. If it is large
enough to be in the critical region, we conclude that the difference is
significant or that the treatment has a significant effect. In this case we
reject the null hypothesis. If the mean difference is relatively small, then
the test statistic will have a low value. In this case, we conclude that the
evidence from the sample is not sufficient, and the decision is fail to reject
the null hypothesis.
Errors in Hypothesis Tests
• Just because the sample mean (following treatment) is different

from the original population mean does not necessarily indicate
that the treatment has caused a change.
• You should recall that there usually is some discrepancy between a
sample mean and the population mean simply as a result of
sampling error.
• Because the hypothesis test relies on sample data, and because
sample data are not completely reliable, there is always the risk
that misleading data will cause the hypothesis test to reach a
wrong conclusion.
• Two types of error are possible.
• A Type I error occurs when the sample data appear to show a treatment
effect when, in fact, there is none.
• In this case the researcher will reject the null hypothesis and falsely
conclude that the treatment has an effect.
• Type I errors are caused by unusual, unrepresentative samples. Just by
chance the researcher selects an extreme sample with the result that the
sample falls in the critical region even though the treatment has no
effect.
• The hypothesis test is structured so that Type I errors are very unlikely;
specifically, the probability of a Type I error is equal to the alpha level.
• A Type II error occurs when the sample does not appear to have been
affected by the treatment when, in fact, the treatment does have an
effect.
• In this case, the researcher will fail to reject the null hypothesis and
falsely conclude that the treatment does not have an effect.
• Type II errors are commonly the result of a very small treatment effect.
Although the treatment does have an effect, it is not large enough to
show up in the research study.
Test of Hypothesis: z Statistic ( Known)
One of the most basic hypothesis tests is a test about a population mean. A
business researcher might be interested in testing to determine whether an
established or accepted mean value for an industry is still true or in testing a
hypothesized mean value for a new theory or product.
As an example, a computer products company sets up a telephone service to

assist customers by providing technical support. The average wait time during
weekday hours is 37 minutes. However, a recent hiring effort added technical
consultants to the system, and management believes that the average wait
time decreased, and they want to prove it.
= population mean
= population standard deviation
x̄ = Sample mean
A survey of CPAs across the United States found that the average net income for
sole proprietor CPAs is $74,914.* Because this survey is now more than ten
years old, an accounting researcher wants to test this figure by taking a random
sample of 112 sole proprietor accountants in the United States to determine
whether the net income figure changed. Assume the population standard
deviation of net incomes for sole proprietor CPAs is $14,530. Suppose the 112
CPAs who respond produce a sample
mean of $78,695.
Ho: the mean still equals $74,914

H1: the mean still not equal $74,914
= $74,914
= $14,530
x̄ = $78,695
N= 112
= $74,914, = $14,530, x̄ = $78,695, n= 112, α=0.05
this test statistic, z = 2.75, is greater than the critical value of z in the upper tail of
the distribution, z = +1.96, the statistical conclusion reached is to reject the null
hypothesis. The calculated test statistic is often referred to as the observed value.
Thus, the observed value of z for this problem is 2.75 and the critical value of z for
this problem is 1.96.
and Finite Population
= $74,914, = $14,530, x̄ = $78,695, n= 112, N= 600, α=0.05
Use of the finite correction factor increased the observed z value from 2.75 to 3.05.
The decision to reject the null hypothesis does not change with this new
information.
In an attempt to determine why customer service is important to managers in
the United Kingdom, researchers surveyed managing directors of
manufacturing plants in Scotland.* One of the reasons proposed was that
customer service is a means of retaining customers. On a scale from 1 to 5,
with 1 being low and 5 being high, the survey respondents rated this reason
more highly than any of the others, with a mean response of 4.30. Suppose
U.S. researchers believe American manufacturing managers would not rate this
reason as highly and conduct a hypothesis test to prove their theory.
Alpha is set at .05. Data are gathered and the following results are obtained.
Use these data and the eight steps of hypothesis testing to determine whether
U.S. managers rate this reason significantly lower than the 4.30 mean
ascertained in the United Kingdom. Assume from previous studies that the
population standard deviation is 0.574.
The alternative hypothesis is that the population mean is lower than 4.30.
The null hypothesis states the equality case.
= 4.30, = 0.574, x̄ = 4.156, n= 32, α=0.05

State the decision rule. Because this test is a one-tailed test, the critical value of
the test statistic is z.05 = -1.645.
The critical value of the test statistic is z.05 = -1.645. An observed test statistic must
be less than -1.645 to reject the null hypothesis. Because the observed test statistic
is not less than the critical value and is not in the rejection region, the statistical
conclusion is that the null hypothesis cannot be rejected.
Testing Hypotheses: A Population Mean
Using The T Statistic ( Unknown)
When a business researcher is gathering data to test hypotheses about a
single population mean, the value of the population standard deviation is
unknown and the researcher must use the sample standard deviation as an
estimate of it. In such cases, the z test cannot be used.
The t distribution, which can be used to analyze hypotheses about a single

population mean when is unknown if the population is normally distributed
for the measurement being studied.
df = degree of freedom
= population mean
= population standard deviation
x̄ = Sample mean
The U.S. Farmers’ Production Company builds large harvesters. For a harvester to be
properly balanced when operating, a 25-pound plate is installed on its side. The
machine that produces these plates is set to yield plates that average 25 pounds. The
distribution of plates produced from the machine is normal. However, the shop
supervisor is worried that the machine is out of adjustment and is producing plates that
do not average 25 pounds. To test this concern, he randomly selects 20 of the plates
produced the day before and weighs them. Table 9.1 shows the weights obtained,
along with the computed sample mean and sample standard deviation.
The test is to determine whether the machine is out of control, and the shop supervisor
has not specified whether he believes the machine is producing plates that are too
heavy or too light. Thus a two-tailed test is appropriate. The following hypotheses are
tested.
Figure depicts the t distribution for

this example, along with the critical
values, the observed t value, and the
rejection regions. In this case, the
decision rule is to reject the null
hypothesis if the observed value of t
is less than -2.093 or greater than
+2.093 (in the tails of the
distribution).
Because the observed t value is +1.04, the null hypothesis is not rejected. Not
enough evidence is found in this sample to reject the hypothesis that the
population mean is 25 pounds.
The researcher’s hypothesis is that the average size of a U.S. farm is more than 471 acres.
Because this theory is unproven, it is the alternate hypothesis. The null hypothesis is that
the mean is still 471 acres.
With 23 data points, df = n - 1 = 23 - 1 = 22. This test is one tailed, and
the critical table t value is t.05,22 = 1.717. The decision rule is to reject the
null hypothesis if the observed test statistic is greater than 1.717.
The sample mean is 498.78 and the sample standard deviation is
46.94. The observed t value is
The observed t value of 2.84 is greater than the table t value of 1.717,
so the business researcher rejects the null hypothesis. She accepts the
alternative hypothesis and concludes that the average size of a U.S.
farm is now more than 471 acres.
Agribusiness researchers can

speculate about what it means to
have larger farms. If the average
size of a farm has increased from
471 acres to almost 500 acres, it
may represent a substantive
increase.
It could mean that small farms are
not financially viable. It might mean
that corporations
are buying out small farms and that
large company farms are on the
increase. Such a trend might spark
legislative movements to protect the
small farm. Larger farm sizes might
also affect commodity trading.
Analysis of variance, or ANOVA
One-Way Analysis of Variance (ANOVA)

One-Way ANOVA compares the means of two or more independent
groups in order to determine whether there is statistical evidence that the
associated population means are significantly different. One-Way ANOVA
is a parametric test. This test is also known as: One-Factor ANOVA.
The null hypothesis states that the population means for all treatment levels
are equal. Because of the way the alternative hypothesis is stated, if even one
of the population means is different from the others, the null hypothesis is
rejected.
SST = total sum of squares

SSC = sum of squares column (treatment)
SSE = sum of squares error
As an example of a completely randomized design, suppose a researcher decides to

analyze the effects of the machine operator on the valve opening measurements of
valves produced in a manufacturing plant, like those shown in Table 11.1. The
independent variable in this design is machine operator. Suppose further that four
different operators operate the machines. These four machine operators are the
levels of treatment, or classification, of the independent variable. The dependent
variable is the opening measurement of the valve. Figure 11.2 shows the structure
of this completely randomized design. Is there a significant difference in the mean
valve openings of 24 valves produced by the four operators? Table 11.2 contains the
valve opening measurements for valves produced under each operator.
H0: 1 = 2 = 3 = 4
H1: At least one of the means is
different from the others.
The observed F value is 10.18. It is compared

to a critical value from the F table to
determine whether there is a significant
difference in treatment or classification.
In the one-way ANOVA, the dfC values are the treatment (column) degrees of
freedom, C - 1. The dfE values are the error degrees of freedom, N - C. Table 11.4
contains an abbreviated F distribution table for = .05. For the machine operator
example, dfC = 3 and dfE = 20, F.05,3,20 from Table 11.4 is 3.10. This value is the critical
value of the F test. Analysis of variance tests are always one-tailed tests with the
rejection region in the upper tail. The decision rule is to reject the null hypothesis if
the observed F value is greater than the critical F value (F.05,3,20 = 3.10).
For the machine operator problem, the observed F value of 10.18 is larger than the
table F value of 3.10. The null hypothesis is rejected. Not all means are equal, so
there is a significant difference in the mean valve openings by machine operator.
For the machine operator problem, the observed F value of 10.18 is larger than the
table F value of 3.10. The null hypothesis is rejected. Not all means are equal, so
there is a significant difference in the mean valve openings by machine operator.
A company has three manufacturing plants, and company officials want to
determine whether there is a difference in the average age of workers at the three
locations. The following data are the ages of five randomly selected workers at
each plant. Perform a one-way ANOVA to determine whether there is a significant
difference in the mean ages of the workers at the three plants. Use α= .01 and note
that the sample sizes are equal.
H0: 1 = 2 = 3
H1: At least one of the means is different from
the others.
64.87
1.63
The decision is to reject the null hypothesis because the observed F value of
39.80 is greater than the critical table F value of 6.93. There is a significant
difference in the mean ages of workers at the three plants.
CHI-SQUARE TEST
The chi-square techniques presented here for analyzing categorical data, the
chi-square goodness-of-fit test and the chi-square test of independence, are
an outgrowth of the binomial distribution and the inferential techniques for
analyzing population proportions.
Categorical data are nonnumerical data that are frequency counts of

categories from one or more variables.
For example, it is determined that of the 790 people attending a convention,

240 are engineers, 160 are managers, 310 are sales reps, and 80 are
information technologists. The variable is “position in company” with four
categories: engineers, managers, sales reps, and information technologists.
Chi-square Goodness-of-fit Test
The chi-square goodness-of-fit test is used to analyze probabilities of multinomial

distribution trials along a single dimension. For example, if the variable being
studied is economic class with three possible outcomes of lower income class,
middle income class, and upper income class, the single dimension is economic
class and the three possible outcomes are the three classes. On each trial, one and
only one of the outcomes can occur. In other words, a family unit must be
classified either as lower income class, middle income class, or upper income class
and cannot be in more than one class.
The chi-square goodness-of-fit test compares the expected, or theoretical,

frequencies of categories from a population distribution to the observed, or actual,
frequencies from a distribution to determine whether there is a difference
between what was expected and what was observed. For example, airline industry
officials might theorize that the ages of airline ticket purchasers are distributed in
a particular way. To validate or reject this expected distribution, an actual sample
of ticket purchaser ages can be gathered randomly, and the observed results can
be compared to the expected results with the chi-square goodness-of-fit test.
One survey of U.S. consumers conducted by The Wall Street Journal and NBC News
asked the question: “In general, how would you rate the level of service that
American businesses provide?” The distribution of responses to this question was as
follows:
Suppose a store manager wants to find out whether the results of this consumer
survey apply to customers of supermarkets in her city. To do so, she interviews 207
randomly selected consumers as they leave supermarkets in various parts of the city.
She asks the customers how they would rate the level of service at the supermarket
from which they had just exited. The response categories are excellent, pretty good,
only fair, and poor. The observed responses from this study are given in Table 16.1.
Now the manager can use a chi-square goodness-of-fit test to determine whether
the observed frequencies of responses from this survey are the same as the
frequencies that would be expected on the basis of the national survey.
Ho: The observed distribution is the same as the expected distribution.

Ha: The observed distribution is not the same as the expected distribution.
Chi-square goodness-of-fit tests are one tailed because a chi-square of zero
indicates perfect agreement between distributions. Any deviation from zero
difference occurs in the positive direction only because chi-square is determined by
a sum of squared values and can never be negative.
With four categories in this example (excellent, pretty good, only fair, and poor), k =
4. The degrees of freedom are k - 1 because the expected distribution is given: k - 1
= 4 - 1 = 3. For α= .05 and df = 3, the critical chi-square value is 7.8147.
After the data are analyzed, an observed chi-square greater than 7.8147 must be
computed in order to reject the null hypothesis.
n = 207, The expected proportions are given, but the expected frequencies must
be calculated by multiplying the expected proportions by the sample total of the
observed frequencies.
Calculation of Chi-Square for Service Satisfaction
Because the observed value of

chi-square of 6.25 is not greater
than the critical table value of
7.8147, the store manager will
not reject the null hypothesis.
Thus the data gathered in the sample of 207 supermarket shoppers indicate that
the distribution of responses of supermarket shoppers in the manager’s city is
not significantly different from the distribution of responses to the national
survey.
The store manager may conclude that her customers do not appear to have
attitudes different from those people who took the survey. Figure depicts the
chi-square distribution produced by using Minitab for this example, along with
the observed and critical values.
Chi-square: Test Of Independence
Chi-square test of independence, can be used to analyze the frequencies of two

variables with multiple categories to determine whether the two variables are
independent.
For example, a market researcher might want to determine whether the type of
soft drink preferred by a consumer is independent of the consumer’s age.
The chi-square test of independence can be used to analyze any level of data
measurement, but it is particularly useful in analyzing nominal data. Suppose a
business researcher is interested in determining whether geographic region is
independent of type of financial investment.
On a questionnaire, the following two questions might be used to measure

geographic region and type of financial investment.
Q.1: In which region of the country do you reside?
A.Northeast B. Midwest C. South D. West
Q.2: Which type of financial investment are you most likely to make today?
E. Stocks F. Bonds G. Treasury Bills
The business researcher would tally the frequencies of responses to these two
questions into a two-way table called a contingency table. Because the chi-square
test of independence uses a contingency table, this test is sometimes referred to as
contingency analysis.
The null hypothesis for a chi-square test of independence is that the two variables
are independent. The alternative hypothesis is that the variables are not
independent. This test is one-tailed. The degrees of freedom are (r – 1)(c – 1).
Suppose a business researcher wants to determine whether type of gasoline

preferred is independent of a person’s income. She takes a random survey of gasoline
purchasers, asking them one question about gasoline preference and a second
question about income. The respondent is to check whether he or she prefers (1)
regular gasoline, (2) premium gasoline, or (3) extra premium gasoline. The
respondent also is to check his or her income brackets as being (1) less than $30,000,
(2) $30,000 to $49,999, (3) $50,000 to $99,999, or (4) more than $100,000. The
business researcher tallies the responses and obtains the results in Table. Using a α=
.01, she can use the chi-square test of independence to determine whether type of
gasoline preferred is independent of income level.
The hypotheses follow.
Ho: Type of gasoline is independent of income.
Ha: Type of gasoline is not independent of income.
Contingency Table for the Gasoline Consumer
To determine the observed value of chi-square, the researcher must compute the
expected frequencies. The expected values for this example are calculated as
follows, with the first term in the subscript (and numerator) representing the row
and the second term in the subscript (and numerator) representing the column
Contingency Table of Observed and Expected Frequencies for Gasoline Consumer

The observed value of chi-square, 70.78, is greater than the critical value of chi-
square, 16.8119 obtained from Table A.8. The business researcher’s decision is to
reject the null hypothesis; that is, type of gasoline preferred is not independent of
income.
Having established that conclusion, the business researcher can then examine the
outcome to determine which people, by income brackets, tend to purchase which
type of gasoline and use this information in market decisions.
THANK YOU

SUMSEM2022-23 BMT5113 TH VL2022230700419 2023-06-24 Reference-Material-I

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SUMSEM2022-23 BMT5113 TH VL2022230700419 2023-06-24 Reference-Material-I

Uploaded by

Copyright:

Available Formats

Data Analysis for Managers

➢ Reach a statistical conclusion in hypothesis testing problems about a

• Hypotheses are tentative explanations of a principle

• research hypothesis is a statement of what the researcher

• Hypothesis testing is a technique to help determine whether

• The purpose of the hypothesis test is to decide between two

• The alternative hypothesis, on the other hand, states that the

As an example, suppose flour packaged by a manufacturer is sold by weight; and a

• Just because the sample mean (following treatment) is different

As an example, a computer products company sets up a telephone service to

Ho: the mean still equals $74,914

= 4.30, = 0.574, x̄ = 4.156, n= 32, α=0.05

The t distribution, which can be used to analyze hypotheses about a single

Figure depicts the t distribution for

Agribusiness researchers can

One-Way Analysis of Variance (ANOVA)

SST = total sum of squares

As an example of a completely randomized design, suppose a researcher decides to

The observed F value is 10.18. It is compared

Categorical data are nonnumerical data that are frequency counts of

For example, it is determined that of the 790 people attending a convention,

The chi-square goodness-of-fit test is used to analyze probabilities of multinomial

The chi-square goodness-of-fit test compares the expected, or theoretical,

Ho: The observed distribution is the same as the expected distribution.

Calculation of Chi-Square for Service Satisfaction

Because the observed value of

Chi-square test of independence, can be used to analyze the frequencies of two

On a questionnaire, the following two questions might be used to measure

Suppose a business researcher wants to determine whether type of gasoline

Contingency Table for the Gasoline Consumer

Contingency Table of Observed and Expected Frequencies for Gasoline Consumer

You might also like