Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

DSC 2008 Business AnalyticsData and Decisions Tutorial 3

This covers some topics missed by Assignment 1 last week. Q2, 4 & 6 for tutorial
discussions; answers for rest already included.
(1) 4e: P 505, 5e: P 451, 6e: P406, problem 52, using Tut3-Q1-DVD Movies.xlsx.
(Fictitious DVD Movies data) (PivotTable) (The chi-square part is deemphasized, since not in syllabus this semester. Use this to practise
PivotTable.)
Use PivotTable to count the number in each cell. See Tut3-Q1-DVD
Movies(answers).xlsx for one way to do it.
a. Column Labels (FirstChoice), Row Labels (State), Values (Count of
Purchases). P-value = 0, so definite dependence. From graph, it can be
seen that Washington (WA) & Indiana (IN) really like Comedy, for instance,
when compared to the other states. On the other hand, Indiana really
doesnt like Action and Washington dislikes SciFi.
b. Reuse the PivotTable by selecting Show Field List and change ROWS from
State to City. P-value = 0, so dependence again. From graph, Tampa really
likes comedy, while SciFi is the overwhelming favourite of Memphis.
c. P-value is 8.9E-22, which is basically 0. From graph, it was no surprise that
Male likes Action, but was it expected that Female likes SciFi? In fact, Male
likes drama more than Female!
(2) 4e: P 513, problem 55a (omit part b); 5e & 6e: not included; using Tut3-Q2P09_55.xlsx. National Airlines recently introduced a daily early-morning
nonstop flight between Houston and Chicago. The vice president of marketing
for National Airlines decided to perform a statistical test to see whether
Nationals average passenger load (filled seats) on this new flight is different
from that of each of its two major competitors. Ten early-morning flights were
selected at random from each of the three airlines and the percentage of
unfilled seats on each flight was recorded.
a. Is there evidence that Nationals average passenger load on the new flight
is different from that of its two competitors? Report a p-value and interpret
the results of the statistical test.
(3) 4e: P 516, problem 59; 5e: P 453 & 6e: P408, problem 55. (Childhood Cancer)
H0: p 0.0002; Ha: p > 0.0002
It is interpreted that there are 4 cases in 7 years. Thus, n = 700 and p =
0.0002 in Binomial. This is strictly not true, of course, since we actually only
have 100 trials repeated 7 times each. Hence, a child that got cancer during
the first year probably is still cancerous during the second year; we therefore
dont have 700 independent trials.
1

We have Binomial (700, 0.0002).


P-value = B(700, 0.0002) > 4 = 1 - BINOMDIST(4-1, 700, 0.0002, TRUE) =
0.000014.
Using the less-accurate (especially since np = 0.14 < 5) Normal
approximation with continuity correction (the -0.5 bit),
z = (4-0.5 - 700*0.0002)/sqrt(700*0.0002*(1-0.0002)) = 8.98,
P(Z > z) =1-NORMSDIST(8.98) = 0.000000 = p-value.
Tut3-Q3(answers).xlsx gives the Excel calculations. Hence, evidence that
cancer for children of workers at the business school exceeds the national
average.
(4) 4e: P 518, problem 77; 5e: P 456 & 6e: P 411, problem 71; using Tut3-Q4P09_77.xlsx. (University Salary)
Let = average salary.

H0: pre-tenure = tenured


Ha: pre-tenure > tenured
Tut3-Q4-P09_77(answers)&.xlsx is done using ANOVA andmore appropriately
a 2-sample t-test. The t-test p-value = 0.996, leading to the conclusion that
new hires do not on average make more salaries than tenured professors.
Looking at the histograms, it appears that tenured faculty members are
divided into two groups (Full Professors and Associate Professors? Finance &
Accounting professors versus the rest?). The newly hired faculty members
appear to make about the same as the first group of professors, who all would
have worked for a number of years even before their tenure. It is therefore
not certain that salary compression did not occur in this business school.
(5) 4e: P 552, 5e: P 486, 6e: P441, problem 2; using Tut3-Q5-P02_10.xlsx.
(Midterm & Final)
Tut3-Q5-P02_10(answers).xlsx has the plots.
a. The scores seem to be fairly linearly related.
b. To add the trend line, right-click on any plotted point in the scatterplot,
then select Add Trendline. The R 2 is 0.58. Final = 0.9079 + 0.997 Midterm. Final is about 1+Mid-term.
In the sheet Regression2, we subtract 1 from Final, so that Final is basically
exactly predicted by Mid-term (intercept is nearly 0). Waithow is this
possible? What happened to Regression Effect? This is possible because
Mid-term and Final do not have the same SD. If we had converted both
Mid-term and Final to Standard Units, then the coefficient of Mid-term will
2

be 0.761 (same as correl(Final, Mid-term)), as seen in the sheet


Regression3.
c. The regression reproduces the R2 and regression line of the plot. The
Standard Error of 6.2 indicates that the standard deviation of the vertical
distance from a point to the regression line is about 6. Roughly then, the
typical residual of the in-sample fit for the Final Score is about 6 points.
(6) You just bought a bicycle odometer, and need to find out what setting to use
for 26 wheel with 1 tire, since your mountain bike has slim tires for urban
use. The Tut3-Q6-BikeComputer.pdf manual wasnt clear. Please figure out
what setting to use. If you need optional background info, you may consult
Tut3-Q6-TireSizingSystems.pdf (only if necessary).
(7) Drug screening in a pharmaceutical company is the process of determining
whether a drug is effective in treating a particular disease. To abandon a drug
when in fact it is a useful one is clearly undesirable, yet there is always some
chance of that. On the other hand, to go ahead with further, more expensive
testing of a drug that is in fact useless wastes time and money that could
have been spent on testing other compounds.
An investigator implants cancer cells in 100 laboratory mice. From this
group, 50 mice are randomly selected and treated with a drug. The
remaining 50 are left untreated, and comprise what is known as the control
group. After a fixed length of time, the actual tumour weights of all the mice
in the experiment are measured. If the population mean tumour weight of all
mice treated with the drug is significantly less than the population mean
tumour weight of all untreated mice, then the drug will be provisionally
accepted for further testing, else rejected.
(a) Give the appropriate null and alternative hypotheses for the drugscreening test.
Ho: mean tumour weights are the same for treatment and control groups
Ha: mean tumour weight for treatment group smaller than that for control
group
(b) What are the Type I and Type II errors for this test?
Type I error: continue with further testing when in fact mean tumour
weights are same for the two groups
Type II error: abandon the drug when in fact the mean weight of treated
mice is smaller than that of in the controlled group
(c) Using a significance level of 1%, explain how the decision rule for this test
would look like (assuming that some Z statistic would be involved). [This is
not covered in class yet.]
Properly should be done using the 2-sample t-test, but, approximately, z =
(mean1-mean2)/sqrt((s1^2)/50 + (s2^2)/50).
3

(d) Without additional calculations, explain whether a significance level of 5%


would be considered more or less stringent (however you might define this
term in this context) than 1%.
5% will make it easier to reject Ho of same tumor weight for treatment
and control groups, or easier to accept that treatment group fared better.
This means less stringent in establishing effectiveness of the drug.
(8) In a Hypothesis Testing situation:
(a) Why the Type I error and the Type II error cannot occur at the same time?
Type 1 error can only occur when Ho is true, and Type II when Ho is not.
(b) Is it possible for Type I error probability to be very small, while Type II
errors is very big?
Given a sampling procedure, Type I error can only decrease when Type II
error increases. Hence, Type II error can be very big (near to 1).
(c) Is the statement The P-value is the probability that the Null Hypothesis is
true correct?
No. The Ho is either true or not (probability of 1 or 0). However, a big pvalue favours the non-rejection of Ho.
(d) Everything else being equal, please explain why the P-value will increase
or decrease with the sample size.
P-value decreases with increasing sample size, since even minute
departure from Ho can be detected with a large sample.
(e) Why is it uncommon to fix the probability of Type II error, i.e. , in a
statistical test?
The Type II error is computed under the curve for the alternative
hypothesis. In many cases, the alternative hypothesis, e.g. > 0, cannot
be represented by one fixed curve, so cannot be computed.

You might also like