Download as pdf or txt
Download as pdf or txt
You are on page 1of 86

Stat 130 Exam 1

1
Important Formulas and Concepts

1 Chapter 1
1.1 Definitions
1. Data
Any collection of numbers, characters, images, or other items that provide information
about something.

2. Categorical/Qualitative Variables
Name categories for grouping.

3. Quantitative Variables
When a variable contains measured numerical values with measurement units.

4. Ordinal Variable
A categorical variable with an ordering.

5. Identifier Variable
Each record has a unique value like Student ID or SSN.

6. Frequency Table
Records to totals and uses the category names to label each row.

7. Relative Frequency Table


Displays percentages of the values in each category.

8. Bar Chart
Displays the distribution of a categorical variable, showing counts for each category
next to each other for easy comparison.

9. Relative Frequency Bar Chart


Same as a bar chart but displays the percentage of people in each category rather than
the counts.

10. Pie Charts


Shows a whole group of cases as a circle. The circle is sliced into pieces whose size is
proportional to the fraction of the whole in each category.

11. Distribution
Slices up all the possible values of the variable into equal width bins and gives the
number of values (or counts) falling into each bin.
1
This version: September 15, 2021, by Dale Embers. May not include all things that could possibly be
tested on. To be used as an additional reference to studying all Chapters 1 - 5.
12. Histogram
Uses adjacent bars to show the distribution of a quantitative variable. Each bar shows
the frequency of values falling into each bin.

13. Unimodal
Histogram with one peak.

14. Bi-modal
Histogram with two peaks.

15. Uniform
Histogram that doesn’t appear to have any mode. Bars are approximately the same
height for each bin.

16. Symmetric
Histogram in which the two halves on either side of the center look approximately like
mirror images.

17. Skew
Histogram that is not symmetric.

18. Skew Left


Histogram with a long tail on the left.

19. Skew Right


Histogram with a long tail on the right.

2 Chapter 2
2.1 Definitions
1. 5 Number summary- Min Q1 Median Q3 Max

2. Boxplot: Displays the 5 number summary as a central box with whiskers that extend
to the nonoutlier data values.

3. Use sample mean and sample standard deviation when the data is symmetric and
has no significiant outlier. Use the median and IQR when the data is skewed or has
signficant outliers.

4. Outliers are values that are either above the upper fence or below the lower fence.

2.2 Formulas
1. Median = Once the data is ordered from smallest to largest, it is the middle value in
the data. Divides the histogram into 2 equal pieces.
P
x
2. Mean = Average of all of the values = x̄ =
n
3. Range = Max - Min

4. Q1 = Median of the lower half of the data

5. Q3 = Median of the upper half of the data

6. IQR = Q3 − Q1

(x − x̄)2
P
2
7. Variance: s =
n−1

8. Standard
v Deviation: s = s2
u (x − x̄)2
uP
=t
n−1
9. Upper Fence for Boxplot = Q3 + 1.5IQR

10. Lower Fence for Boxplot = Q1 − 1.5IQR

3 Chapter 3
3.1 Definitions
1. Scatterplot
Shows the relationship between 2 quantitative variables.

2. Direction of Scatterplot
Positive direction means as one variable increases so does the other. Decreasing direc-
tion means the association is negative.

3. Form of Scatterplot
Is it in a straight line or some other form?

4. Strength of Scatterplot
Strong association if there is little scatter around the underlying relationship.

5. Outlier
A point that does not fit the overall pattern seen in the scatterplot.

3.2 Formulas
1. Correlation Coefficient
Ph x−x̄  y−ȳ i
( sx ) sy
r= (n−1)
3.3 Properties of Correlation Coefficient
1. Measures the strength and direction of the linear association between two quantitative
variables.

2. Close to 0 implies the linear association is probably weak

3. Close to ±1 implies the linear association is probably strong.

4. The sign of r is the same as the direction of the association

5. Always between -1 and 1

6. Does not matter which variable you consider as x and y.

7. Treats x and y symmetrically

8. No units

9. Affected by outlers

10. Not affected by changes in scale or if the variables are standardized (changing to z-
score)

11. Does NOT prove causation. Only provides a relationship between 2 variables.

4 Chapter 4
4.1 Definitions
1. Linear Model
Equation of the form ŷ = a + bx. ŷ means estimated values for y.

2. Predicted Values
Value of ŷ found for a given x-value in the data.

3. Fitted Linear Model


ŷ = a + bx

4. Residual
Differences between data values and the corresponding values predicted by the model
(observed - expected)

5. R2
Gives the fraction of variability of y accounted for by the least squares linear regression
on x. It is an overall measure of how successful the regression is in linearly relating y
to x.
6. Least Squares Criterion
Specifies the unique line that minimizes the variance of the residuals or the sum of
squared residuals.

7. Extrapolation
In any regression situation it is unsafe. Predictions from extrapolation should not be
trusted.

8. Influential Point
A point that ,if omitted from the data, results in a very different regression model.

4.2 Formulas
1. Residual = Observed value - Predicted value = y − ŷ

2. b = r (sy /sx )

3. a = ȳ − bx̄

4.3 Residual Plots


A residual plot is a scatterplot that shows the residual versus x values. The scatterplot of
the residuals should appear to be completely random. In particular, the spread should not
change from one part of the plot to another and should not follow any pattern.

5 Chapter 5
5.1 Definitions
1. Contingency Table
Table which shows how the individuals are distributed along each variable.

2. Marginal Distribution
Row total or column total in contingency tables.

3. Conditional Distribution
Show distribution of one variable for just those cases that satisfy a condition on another
variable. Example: Event B given Event A occurs first.

6 Extra Information
Review any and all notes and supplementary materials. It may be the case that something
was accidentally omitted from this study guide. Also, review any problems that may have
been discussed in class as not all example problems may have been provided here.
7 Example Problems
1. Use the below table to answer the following questions.

Eye Color
Blue Green Brown Total
Male 5 7 15 27
Gender
Female 6 2 10 18
Total 11 9 25 45

(a) Construct a frequency table for Eye Color based on the data above.
(b) Find the marginal distribution of gender.
(c) What percentage of females have blue eyes?
(d) What percentage of green eyed people are male?
(e) What percentage of people are females and have green eyes?
(f) What percentage of blue eyed people are female?
(g) What percentage of males have brown eyes?
(h) What percentage of people have brown eyes?
(i) What percentage of people are males and have blue eyes?

2. We are investigating whether people taking antidepressants (SSRIs) might be at greater


risk of bone fractures. We are given the below contingency table.

SSRI no SSRI Total


Fractures 14 244 258
No Fractures 123 4627 4750
Total 137 4871 5008

(a) What percent of people taking SSRIs have fractures?


(b) What percent of people not taking SSRIs have fractures?
(c) Is the risk of bone fractures the same among people who were taking SSRIs versus
those who were not?

3. In a histogram that is skewed to the right, which is larger, the mean or the median?

4. In a histogram that is skewed or has outliers, which should be reported, the mean or
the median?

5. In a histogram that is skewed or has outliers, which should be reported, the IQR or
the Standard Deviation?

6. In a histogram that is symmetric with no outliers, which pair of things should be


reported, mean with the standard deviation, mean with the IQR, median with the
standard deviation, or the median with the IQR?
7. Here are costs of nine compact refrigerators rated very good or excellent by Consumer
Reports on their website.
150, 150, 160, 180, 150, 140, 120, 130, 120
Find

(a) Mean
(b) Median and Quartiles
(c) Range and IQR
(d) What are the values of the upper fence and the lower fence?
(e) Are there any outliers in this data? Why?
(f) Find the variance and standard deviation.

8. Shown below are the histogram and summary statistics for the number of camp sites
at public parks in Vermont.

(a) Which statistics would you use to identify the center and spread of this distribu-
tion? Why?
(b) How many parks would you classify as outliers? Explain.
(c) Create a boxplot for this data.
9. Use the given data to answer the following questions.

Energy Used (KWH) Price ($)


2 2
3 5
5 9
7 10
9 20
12 22
15 27
20 30
mean = 9.125 mean = 15.625
std dev = 6.22 std dev = 10.49
r = 0.9687

(a) Draw a scatterplot of the above data to study the association between the energy
used and the price it cost.
(b) What is the direction of the association?
(c) What is the form of the relationship?
(d) What is the strength of the relationship?
(e) Are there any outliers?
(f) What is the correlation coefficient?

Now consider the regression line for the above data.

(a) What is the slope of the regression line?


(b) What is the intercept of the regression line?
(c) What percent of variation is explained by this model?
(d) What does the slope mean in this context?
(e) What does the intercept mean in this context?
(f) Write down the overall model.
(g) What would you predict for the price in the case where there are 18 KWH of
energy used? Is this prediction reasonable?
(h) The energy company actually charges you $27.50 for 18 KWH of energy used. Is
this good? How much would you save or lose compared to what you expected to
pay?
(i) What would you predict for the price in the case where there are 35 KWH of
energy used? Is this prediction reasonable?

10. For the below residual plots, decide if a linear model is appropriate. In the case that the
linear model is not appropriate, decide which condition is violated (linearity, outlier,
or equal spread).
Figure 1: Residual Plots
(a) Plot 1 (b) Plot 2 (c) Plot 4

(d) Plot 6 (e) Plot 7


8 Example Solutions
1. Use the table to answer the questions.
(a) Frequency Table:
Eye Color Count
Blue 11
Green 9
Brown 25
27 18
(b) Male: 45
= 0.6 = 60%. Female: 45
= 0.4 = 40%.
6
(c) 18
= 33.3%.
7
(d) 9
= 77.8%.
2
(e) 45
= 4.4%.
6
(f) 11
= 54.5%.
15
(g) 27
= 55.6%.
25
(h) 45
= 55.6%.
5
(i) 45
= 11.1%.
2. (a) 14/137 = 10.2%
(b) 244/4871 = 5.01%
(c) No, the risk of bone fractures is about twice as high among the people who were
taking SSRIs than among those who were not.
3. The mean is larger.
4. The median.
5. IQR.
6. Mean with the standard deviation.
7. (a) Mean = ȳ = 1300/9 = 144.4
(b) First, order the numbers from lowest to highest.
Median = middle value = 150
Q3 = 155
Q1 = 125
(c) IQR = 155 − 125 = 30
Range = Max - Min = 180 − 120 = 60
(d) Upper Fence = Q3 + 1.5IQR
= 155 + 1.5(30)
= 200,
Lower Fence = Q1 − 1.5IQR
= 125 − 1.5(30)
= 80.
(e) There are not outliers because Max ≤ Upper Fence and Min ≥ Lower Fence.
P
(y−ȳ)2
(f) Variance = n−1
(150 − 144.4)2 + (150 − 144.4)2 + (160 − 144.4)2 + (180 − 144.4)2 + (150 − 144.4)2 + (140 −
=
9−1
3022.24
=
8
= 377.78, √
Std√Dev = V ariance
= 377.78
= 19.44.

8. (a) Median and IQR because the histogram is skewed right.


(b) IQR = 78 − 28 = 50
Upper Fence = Q3 + 1.5IQR = 78 + 1.5(50) = 78 + 75 = 153
Lower Fence = Q1 - 1.5IQR = 28 − 1.5(50) = 28 − 75 = −47
The number of parks cannot be negative so the lower fence should be placed at
0. Since max > upper fence, there are outliers. There are probably 4-5 based on
the histogram.
(c) Note that the boxplot below is rounded so if you use exact numbers, it may look
slightly different.
9. Use the data to answer the following.

(a) The scatterplot would be as below.


Price ($) vs Energy Used (KWH)
35

30

25

20

Price ($)
15

10

0
0 5 10 15 20 25
Energy Used (KWH)

(b) Direction is positive.


(c) The form of the relationship is linear.
(d) The strength of the relationship is strong.
(e) There are no outliers.
(f) The correlation coefficient is 0.9687.

Now consider the regression line for the above data.

(a) The slope of the regression line is b1 = r ssxy


 
10.49
= 0.9687 6.22
= 1.634.
(b) The intercept of the regression line is b0 = ȳ − b1 x̄
= 15.625 − 1.634 (9.125)
= 0.715.
(c) The percent of variation explained by this model is R2 = (0.9687)2 = 0.9384 =
93.84%.
(d) The slope means that as the energy used increases by 1 KWH, the price increases
by $1.634.
(e) The intercept means that the base price for services, with no energy used, costs
$0.715.
(f) The overall model is: yb = 0.715 + 1.634x.
(g) You would predict the price to be 0.715 + 1.634(18) = $30.13. This prediction is
reasonable because we are not extrapolating from the data.
(h) The energy company actually charges you $27.50 for 18 KWH of energy used.
This is good because we would expect to pay more than this for the energy used.
You would save $2.63. Note that when you take 27.50 − 30.13 = −2.63 you are
calculating the residual.
(i) You would predict the price to be 0.715 + 1.634(35) = $57.91. This prediction is
not reasonable and cannot be trusted because we are extrapolating.
(j) The predicted standardized value for price is r(SD) = 0.9687(3) = 2.9061.

10. Classify each of the residual plots.

(a) Linear model is not appropriate.


(b) Linear model is appropriate.
(c) Linear model is not appropriate.
(d) Linear model is not appropriate.
(e) Linear model is not appropriate.
Stat 130 Exam 2
1
Important Formulas and Concepts

1 Chapter 6
1.1 Definitions
1. Population
The entire group of individuals or instances about whom we hope to learn.

2. Sample
A (representative) subset of a population, examined in the hope of learning about the
population.

3. Sample Survey
A study that asks questions of a sample drawn from some population in the hope of
learning something about the entire population.

4. Randomization
The best defense against bias is randomization, in which each individual is given a fair,
random chance of selection.

5. Population Parameter
A numerically valued attribute of a model for a population. Example: mean income
of all employed people in the USA

6. Sample statistic
Statistics or sample statistics are values that are calculated for sample data. Example:
mean income of employed people in a representative sample

7. Sampling Frame
A list of individuals from whom the sample is drawn. Individuals who may be in the
population of interest, but who are not in the sampling frame cannot be included in
any sample.

8. Simple Random Sample (SRS)


A SRS of sample size n is a sample in which each set of n elements in the population
has an equal chance of selection.

9. Stratified Random Sampling


A sampling design in which the population is divided into several subpopulations
(strata) and random samples are then drawn from each stratum. Try to make strata
as homogeneous as possible.
1
This version: October 16, 2021, by Dale Embers. May not include all things that could possibly be
tested on. To be used as an additional reference to studying all Chapters 6 - 11
10. Cluster Sampling
Entire groups, or clusters, are chosen at random. Clusters are heterogeneous.

11. Multistage Sampling


Sampling schemes that combine several sampling methods.

12. Systematic sample


A sample drawn by selecting individuals systematically from a sampling frame.

13. Voluntary response bias


Bias introduced to a sample when individuals can choose on their own whether to
participate in the sample.

14. Undercoverage bias


Biases the sample in a way that gives a part of the population less representation in
the sample than it has in the population.

15. Nonresponse bias


Bias introduced when a large fraction of those sampled fails to respond.

16. Response bias


Anything in a survey design that influences responses.

17. Studies

(a) Observational Study


Study based on data in which no manipulation of factors has been employed.
(b) Retrospective Study
Observational study in which subjects are selected and then their previous con-
ditions or behaviors are determined. Based on historical data and memories.
(c) Prospective Study
Observational study in which subjects are followed to observe future outcomes.
Because no treatments are deliberately applied, it is not an experiment.

2 Chapter 7
1. Experiments

(a) Factor
Variable whose levels are manipulated by the experimenter.
(b) Response Variable
Variable whose values are compared across different treatments.
(c) Experiment
Manipulates factor levels to create treatments, randomly assigns subjects to these
treatment levels, and then compares the responses of the subject groups across
treatment levels. Tries to assess effects of treatments.
(d) Levels
Specific values that the experimenter chooses for a factor.
(e) Treatment
Process, intervention, or other controlled circumstance applied to randomly as-
signed experimental units.
(f) Block
When groups of experimental units are similar in a way that is not a factor
under study, it is often a good idea to gather them together into blocks and then
randomize the assignment of treatments within each block.

2. Randomization through Random Assignment


An experiment must assign experimental units (individuals) to treatment groups using
some form of randomization.

3. Principles of Experimental Design

(a) Control
Control aspects of the experiment that we know may have an effect on the re-
sponse, but that are not the factors being studied.
(b) Randomize
Randomize subjects to treatments to even out effects that we cannot control.
(c) Replicate
Replicate over as many subjects as possible.
(d) Block
Reduce the effects of identifiable attributes of the subjects that cannot be con-
trolled.

4. Statistically Significant
When an observed difference is too large for us to believe that it is likely to have
occurred naturally, we consider the difference to be statistically significant.

5. Types of Experiments

(a) Completely randomized design (CRD)


All experimental units have an equal chance of receiving any treatment.
(b) Randomized Block Design (RBD)
Participants are randomly assigned to treatments within each block.
(c) Matched Pair Designg
Participants are paired with similar subjects (often the same subject), one of
the pair is given the treatment, and the difference in the response variables are
compared.

6. Control Treatment
Baseline treatment.
7. Control Group
Experimental units assigned to a baseline treatment level typically either the default
treatment or a placebo treatment. Responses provide a basis for comparison.
8. Blinding
Any individual associated with an experiment who is not aware of how subjects have
been allocated to treatment groups.
9. Single/Double Blind
ˆ Those who could influence the results.
ˆ Those who evaluate the results.

Single Blind: when either of the two above statements is blinded. Double Blind: when
both of the two above statements is blinded.
10. Placebo
A treatment known to have no effect.
11. Placebo Effect
The tendency of human subjects to show a response even when administered a placebo.
12. Potential Problems
(a) Confounding
When the levels of one factor are associated with the levels of another factor in
such a way that their effects cannot be separated, we say that these two factors
are confounded.
(b) Lurking Variable
A variable associated with both y and x that makes it appear that x may be
causing y.
13. In summary, the best experiments are usually 1) Randomized, 2) Comparative, 3)
Double-blind, and 4) Placebo-controlled.

3 Chapter 9 and 10
3.1 Definitions
1. Random Phenomenon
A phenomenon is random if we know what outcomes could happen, but not which
particular values will happen.
2. Trial
A single attempt or realization of a random phenomenon.
3. Outcome
The value measured, observed, or reported for an individual instance of a trial.
4. Event
A collection of outcomes. Usually, we identify events so that we can attach probabilities
to them. Denote events with bold capital letters like A, B, etc.

5. Sample Space
The collection of all possible outcome values. The collection of values in the sample
space has a probability of 1. Denote by S or Ω.

6. Law of Large Numbers (LLN)


This law states that the long-run relative frequency of an event’s occurrence gets closer
and closer to the true relative frequency as the number of trials increases.

7. Independence (informal definition)


2 events are independent if learning that one event occurs does not change the proba-
bility that the other event occurs.

8. Probability
A number between 0 and 1 that reports the likelihood of that event’s occurrence. Write
P(A) for the probability of event A.

9. Empirical Probability
When the probability comes from the long-run relative frequency of the event’s occur-
rence.

10. Theoretical Probability


When the probability comes from a model (such as equally likely outcomes). P (A) =
# outcomes in A divided by # all possible outcomes

11. Personal (or subjective) Probability


When the probability is subjective and represents your personal degree of belief.

12. Legitimate Assignment of Probabilities


An assignment of probabilities to outcomes is legitimate if

ˆ each probability is greater than or equal to 0 and less than or equal to 1


ˆ the sum of the probabilities = 1

3.2 Rules on Probability


1. For all events A, 0 ≤ P (A) ≤ 1.

2. Probability Assignment Rule

ˆ P(S) = 1
ˆ The set of all possible outcomes of a trial must have probability = 1.

3. Complement Rule
ˆ Set of outcomes that are not in the event A is the complement AC
ˆ P (AC ) = 1 − P (A) Where AC is the complement of A,
ˆ The probability of an event not occurring is 1 minus the probability that it occurs

4. Addition Rule

ˆ For 2 mutually exclusive events A and B, the probability that one or the other
occurs is the sum of the probability of the two events.
ˆ P (A or B) = P (A) + P (B) where A and B are mutually exclusive.
ˆ disjoint also means mutually exclusive; there are no outcomes in common

5. Multiplication Rule

ˆ For two independent events A and B, the probability that both A and B occur
is the product of the probabilities of the two events.
ˆ P (A and B) = P (A)P (B) where A and B are independent

6. General Addition Rule


For any two events A and B, the probability of A or B is
P (A or B) = P (A) + P (B) − P (A and B).
This rule does NOT require disjoint events.

7. Conditional Probability
The conditional probability of the event B given the event A has occurred is
P (B | A) = P (A and B) .
P (A)

8. General Multiplication Rule


For any two events A and B, the probability of A and B is
P (A and B) = P (A)P (B | A).
This rule does NOT require independence.

9. Independent
Events A and B are independent when P (B | A) = P (B). Note: independent is not
the same as disjoint.

10. Tree Diagram


A display of conditional events or probabilities that is helpful in thinking through
conditioning.

11. Bayes Rule


P (A|B)P (B)
P (B | A) = P (A|B)P (B)+P (A|BC )P (BC )
.
 
Since P (A | B)P (B) + P A | BC P BC = P (A) so this may be simplified to read
P (B | A)P (A) = P (A | B)P (B)
3.3 Tree Diagram Example and Interpretations of Every Node
Example Probabilities are Given
P (A and B) = (0.6)(0.8) = 0.48
B
0.8
No
tB
A 0.2
0.6 P (A and Not B) = (0.6)(0.2) = 0.12

No P (Not A and B) = (0.4)(0.2) = 0.08


t
0.4 A B
0.2
No
tB
0.8
P (Not A and Not B) = (0.4)(0.8) = 0.32
Here are the mathematical interpretations of the numbers in the tree diagram:
P (A) = 0.6 P (A and B) = 0.48 P (B |Not A) = 0.2
P (Not A) = 0.4 P (A and Not B) = 0.12 P (Not B|Not A) = 0.8

P (B | A) = 0.8 P (Not A and B) = 0.08


P (Not B|A) = 0.2 P (Not A and Not B) = 0.32

Calculate things like P (A | B) using Bayes Rule:


P(B|A)P(A)
P(A | B) = P(B|A)P(A)+P(B|A c )P(Ac )
P(B|A)P(A)
= P(B|A)P(A)+P(B|N otA)P(N otA)
(0.8)(0.6)
= (0.8)(0.6)+(0.2)(0.4)
0.48
= 0.56
= 0.8571.
Calculate things like P (B) using the Multiplication Rule but rearranging it.
P (B and A) = P(B)P(A | B) ⇒ P(BandA)
P(A|B)
= P(B).
Now,
P(B) = P(BandA)
P(A|B)
0.32
= 0.8571 = 0.3734.

4 Chapter 11
4.1 Definitions
1. z-score
Tells how many standard deviations a value is from the mean. Regardless of direction,
the farther a data vlaue is from the mean the more unusual it is.
2. Standard Normal Model
A Normal Model N(µ, σ) with mean 0 and standard deviation 1.

3. 68-95-99.7 Rule
In a normal model, about 68% of values fall within 1 standard deviation of the mean,
about 95% fall within 2 standard deviations of the mean; about 99.7% of values fall
within 3 standard deviations of the mean.

4. Normal Percentile
The normal percentile corresponding to a z-score gives the percentage of values in a
standard normal distribution found at that z-score or below. Compared to area under
the curve. See normal table in the textbook.

4.2 Formulas
1. z-score:
x−µ
z=
σ

4.3 Properties about the area under a Normal Curve


1. The total area is 100%

2. The mean is the center of a normal curve

3. A Standard Normal Curve has mean = 0 and standard deviation = 1

4. When a normal curve is split in half from the mean, each side contains 50% of the area

5. The normal curve is symmetric

6. If a normal curve is split with 30% of the area on one side, the other side of the curve
is 70% of the area

7. If a normal curve has 60% of the area in the middle, the remaining portions are a total
of 40%. This 40% is allocated half to each side. So the far left has 20% of the area,
the middle is 60% of the area, and the far right side has 20% of the area.

Textbook Normal Table Note: These tables give the percentage to the left of the z value.
5 Example Problems
1. A healthcare system wants to determine if its patients are being treated with a sufficient
level of care, so they consider a number of sampling methods. Identify each of the
following sample types as Simple Random Sample, Stratified Sample, Cluster Sample,
Multistage Sample, or Convenience Sample.
(a) They randomly select 15 patients from each of the hospitals in the system and
survey them.
(b) They randomly select 10 physicians and survey every patient belonging to that
physician.
(c) They make a list of all the patients in the system and randomly select 200.
2. I decide that I want to know students’ opinions on a variety of issues. Decide which of
the following best describes the issue as Voluntary Response Bias, Nonresponse Bias,
Response Bias, or Undercoverage (choose one for each situation).
(a) I survey students (not anonymously) about whether or not they use illegal drugs.
(b) I ask students to fill out an online survey regarding the use of Blackboard in the
classroom.
(c) I randomly select 300 students from a list of those receiving Pell grants and survey
them regarding financial aid.
3. To test the effect of a medication, 100 volunteers were randomly divided into two
groups. Each person was given a month’s supply of pills. For one group, the pill
contained the medicine, whereas for the other group, the pills contained only inert
ingredients. Participants were not told which type of pill they had. At the end of the
month, a researcher evaluated them to determine if they had improved. The researcher
did not know which of the subjects had the pill with the medicine added. Identify which
of these statements is true.
(a) The group receiving the pill with inert ingredients will not experience the placebo
effect.
(b) This experiment includes blocking.
(c) The number of factors in the experiment is two.
(d) This study is single blind
(e) This study is single blind
(f) The group receiving the medicine is the control group.
4. A veterinarian is studying the effect a diet high in alfalfa may be a cause in horse.
The veterinarian decides to use the following design: identify 30 horses, and divide
them into 3 groups of 10 horses. One group consists of horses in barn stalls, one group
consist of horses in outdoor paddocks, and one group consist of horses in pastures.
Within each group, one horse is randomly assigned to an high alfalfa diet and one is
fed a low alfalfa diet. The study is:
(a) a randomized block design
(b) a matched pairs design
(c) a completely randomized design

5. A study attempts to compare two sunscreens. Each of 50 subjects with varying skin
complexions will use both sunscreens—Screen A on one side of the body and Screen B
on the other side. For each subject, a coin is tossed to determine which side receives
Screen A and which receives Screen B. Researchers measure the amount of ultraviolet
light exposure over both treated areas for each subject. This is an example of:

(a) a randomized block design


(b) a matched pairs design
(c) a completely randomized design

6. For his Statistics class experiment, researcher J. Gilbert decided to study how parents’
income affects children’s performance on standardized tests like the SAT. He proposed
to collect information from a random sample of test takers and examine the relationship
between parental income and SAT score.

(a) Is this an experiment or an observational study?


(b) If it is a study, is it retrospective or prospective? If it is an experiment, how many
factors are there?
(c) Identify the explanatory variable and response variable.

7. In 2002, the journal Science reported that a study of women in Finland indicated that
having sons shortened the life spans of mothers by about 34 weeks per son, but that
daughters helped to lengthen the mothers’ lives. The data came from church records
from the period 1640 to 1870.

(a) Is this an experiment or an observational study?


(b) If it is a study, is it retrospective or prospective? If it is an experiment, how many
factors are there?
(c) Identify the explanatory variable and response variable.

8. Some people claim they can get relief from migraine headache pain by drinking a large
glass of ice water. Researchers plan to enlist several people who suffer from migraines
in a test. When a participant experiences a migraine headache, he or she will take a
pill that may be a standard pain reliever or a placebo. Half of each group will also
drink ice water. Participants will then report the level of pain relief they experience.

(a) Is this an experiment or an observational study?


(b) If it is a study, is it retrospective or prospective? If it is an experiment, how many
factors are there?
(c) Identify the explanatory variable and response variable.
9. Athletes who had suffered hamstring injuries were randomly assigned to one of two
exercise programs. Those who engaged in static stretching returned to sports activity
in a mean of 15.2 days faster than those assigned to a program of agility and truck
stabilization exercises.

(a) Is this an experiment or an observational study?


(b) If it is a study, is it retrospective or prospective? If it is an experiment, how many
factors are there?
(c) Identify the explanatory variable and response variable.

10. In a large Introductory Statistics lecture hall, the professor reports that 55% of the
students enrolled have never taken a Calculus course, 32% have taken only one semester
of Calculus, and the rest have taken two or more semesters of Calculus. The professor
randomly assigns students to groups of three to work on a project of the course. What
is the probability that the first group-mate you meet has studied

(a) two or more semesters of Calculus?


(b) some Calculus?
(c) no more than one semester of Calculus?

11. Continuation. What is the probability that of your other two group-mates,

(a) neither has studied Calculus?


(b) both have studied at least one semester of Calculus?
(c) at least one has had more than one semester of Calculus?

12. A certain bowler can bowl a strike 70% of the time. If the bowls are independent,
what’s the probability that she

(a) goes three consecutive frames without a strike?


(b) makes her first strike in the third frame?
(c) has at least one strike in the first three frames?
(d) bowls a perfect game (12 consecutive strikes)?

13. A check of dorms revealed that 38% had refrigerators, 52% had TV’s and 21% had
both a TV and a refrigerator. What’s the probability that a randomly selected dorm
room has:

(a) a TV but no refrigerator


(b) a TV or refrigerator but not both
(c) neither a TV nor a refrigerator
14. We are given information about the Education Level by Country in the below table:
Post Grad College Some HS Primary No Answer Total
China 7 315 671 506 3 1502
France 69 388 766 309 7 1539
India 161 514 622 227 11 1535
UK 58 207 1240 32 20 1557
US 84 486 896 87 4 1557
Total 379 1910 4195 1161 45 7690
Calculate the following probabilities:

(a) P(US)
(b) Probability that a person completed education before college? Do not include
those who did not answer.
(c) Probability that a person is from France or did post graduate study.
(d) Probability that a person is from France and finished primary school.

15. An animal shelter states that it currently has 24 dogs and 18 cats available for adoption.
8 of the dog and 6 of the cats are male. Find the conditional probability of:

(a) pet is male, given that it is a cat


(b) pet is a cat, given that it is female
(c) pet is female, given that it is a dog

16. Followup. The local animal shelter in reported that it currently has 24 dogs and 18
cats available for adoption; 8 of the dogs and 6 of the cats are male. Are being male
and being a dog independent events? Briefy justify your answer.

17. Police setup checkpoints to catch drunk drivers. Based on the initial stop, trained
officers can make the right decision 80% of the time. Suppose a checkpoint is set up at
a time when it is estimated that about 12% of people have been drinking. Questions
to answer:

(a) Suppose a person is stopped and is not drinking. What is the probability that he
is detained for further testing?
(b) What’s the probability that any given driver will be detained?
(c) What’s the probability that a driver who is detained has actually been drinking?
(d) What’s the probability that a driver who was released had actually been drinking?

18. A company’s records indicate that on any given day about 1% of their day-shift employ-
ees and 2% of the night-shift employees will miss work. Sixty percent of the employees
work the day shift. What percent of employees are absent on any given day?
19. We are given the following distribution for X.

X 3 5 6 8 10
P(X = x) 0.2 0.1 0.3 0.3

(a) What is the value of the missing probability in the table above?

20. Given mean 16 and standard deviation 3.

(a) Standardize x = 9
(b) Standardize x = 21
(c) Which of the two above is most unusual?

21. Use a normal model with a mean of 50 and standard deviation of 5.

(a) Using the model described above, draw the model showing what the 68-95-99.7
Rule predicts.
(b) In what interval would you expect the central 99.7% of values to be found?
(c) What percent of values are above 50?
(d) What percent of values are between 40 and 60?
(e) What percent of values are between 40 and 50?
(f) What percent of values are between 50 and 60?
(g) What percent of values are between 45 and 50?
(h) What percent of values are between 50 and 65?
(i) What percent of values are above 60?
(j) What percent of values are below 45?
(k) What percent of values are between 40 and 65?
(l) What percent of values are between 45 and 60?

22. Based on the Normal Model with a mean of 50 and standard deviation of 5, answer
the following questions.

(a) What percent of values are above 50?


(b) What percent of values are above 62?
(c) What percent of values are below 39?
(d) What percent of values are above 43?
(e) What percent of values are below 58?
(f) What percent of values are between 37 and 52?
(g) What percent of values are betwen 57.25 and 66?
23. Based on the Normal Model with a mean of 50 and standard deviation of 5, answer
the following questions.

(a) What cutoff value bounds the highest 5% of values?


(b) What cutoff value bounds the lowest 25% of values?
(c) What cutoff value bounds the middle 70% of values?
6 Example Solutions
1. (a) Stratified Sample
(b) Cluster Sample
(c) Simple Random Sample

2. (a) Response bias if the students answer and lie. Nonresponse Bias if they do not
respond at all.
(b) Voluntary Response Bias
(c) Undercoverage. This method leaves out a lot students.

3. (a) The group receiving the pill with inert ingredients will not experience the placebo
effect. This is fallse. They are given the placebo to induce the placebo effect so
they can be compared to the control goup.
(b) This experiment includes blocking. This is false. The individuals were not
grouped first by some property or condition.
(c) The number of factors in the experiment is two. This is false. There is one factor,
the medicine, with two levels.
(d) This study is single blind. This is false. Both sets of participants were blinded.
(e) This study is double blind. This is true. Both sets of participants were blinded.
(f) The group receiving the medicine is the control group. This is false, the group
receiving the placebo is the control group.

4. This is a block design, as the horses were separated into groups before the treatments
were applied.

5. This is a matched pairs design. The pairs consist of the two sides of the subjects’
bodies.

6. (a) An observational study because no treatments were imposed.


(b) It is a retrospective study.
(c) Explanatory variable: Parental income. Response variable: SAT score.

7. (a) Observational study.


(b) Retrospective. Records were obtained from 1640 to 1870.
(c) Explanatory Variable: Having a son or a daughter. Response variable: Average
life span of mothers.

8. (a) Experiment
(b) There are 2 factors - pain reliever and water temp. The pain reliever has 2 levels -
pain reliever or placebo. The water temperature has 2 levels - ice water or regular
water. Total, there are 4 treatments.
(c) Explanatory variable: pain reliever and water temp. Response variable: level of
pain relief.
9. (a) Experiment
(b) There is 1 factor - type of exercise. This factor has 2 levels - static stretching and
trunk stabilization exercises. In total, there are 2 treatments.
(c) Explanatory variable: type of exercise. Response variable: time before the ath-
letes were able to return to sports.
10. We are given that
P(no calculus) = 0.55,
P(1 semester) = 0.32.

(a) P(2 or more) = 1 - P(no calculus) - P(1 semester) = 1-0.55-0.32 = 0.13.


(b) P(some calculus) = P(1 semester or 2 or more) = P(1 semester) + P(2 or more)
= 0.32+0.13 = 0.45.
(c) P(no more than one semester) = P(no calculus or 1 semester) = P(no calculus)
+ P(1 semester) = 0.55+0.32 = 0.87.

11. We have that


P (no calculus) = 0.55,
P (at least 1 semester) = P (some calculus) = 0.45.

(a) P (neither) = P (person 1 no calculus and person 2 no calculus)


= P (no calculus) P (no calculus)
= (0.55)(0.55)
= 0.3025.
(b) P (both) = P (person 1 some calculus and person 2 some calculus)
= P (some calculus) P (some calculus)
= (0.45)(0.45)
= 0.2025.
(c) Option 1:
P (at least one has had more than one semester)
= P (person 1 some calculus and person 2 no calculus OR person 1 no calculus and
person 2 some calculus OR person 1 some calculus and person 2 some calculus)
= P (some calculus)P (no calculus) + P (no calculus)P (some calculus) + P (some
calculus)P (some calculus)
= (0.87)(0.13) + (0.13)(0.87) + (0.13)(0.13)
= 0.2431.
Option 2:
P (at least one has had more than one semester)
= 1 - P (neither)
= 1-0.7569
= 0.2431.
12. Information given in the problem:
P (strike) = 0.7
P (no strike) = 0.3

(a) goes three consecutive frames without a strike?


P (no strike and no strike and no strike) = P (no strike)P (no strike)P (no strike)
= (0.3)(0.3)(0.3)
= (0.3)3
= 0.027
(b) makes her first strike in the third frame?
P (no strike and no strike and strike) = P (no strike)P (no strike)P (strike)
=(0.3)(0.3)(0.7)
= (0.3)2 (0.7)
= 0.063
(c) has at least one strike in the first three frames? P (no strike) P (at least 1 strike
in first 3 frames) = 1- P (no strikes in first 3 frames)
= 1- 0.027
= 0.973
(d) bowls a perfect game (12 consecutive strikes)?
P (12 consecutive strikes) = P (strike)P (strike)· · · P (strike)
=(0.7)(0.7) · · · (0.7)
= (0.7)12
= 0.0138

13. What we know:

ˆ P(TV) = 0.52
ˆ P(Refrigerator) = 0.38
ˆ P(both) = P(TV and Refrigerator) = 0.21

A Venn Diagram (not shown) may help with this problem.


What else we can calculate (may or may not relate to the above questions asked):

ˆ P(TV only) = P(TV) - P(both) = 0.52-0.21 = 0.31


ˆ P(Refrigerator only) = P(Refrigerator) - P(both) = 0.38- 0.21 = 0.17
ˆ P(TV or Refrigerator) = P(TV) + P(Refrigerator) - P(TV and Refrigerator) =
0.52 + 0.38 - 0.21 = 0.69

Answers to questions:

(a) P(TV but no refrigerator) = P(TV only) = 0.31


(b) P(TV or Refrigerator but not both) = P(TV or Refrigerator) - P(both) = 0.69 -
0.21 = 0.48
OR
P(TV or Refrigerator but not both) = P(TV only) + P(Refrigerator only) = 0.31
+ 0.17 = 0.48
(c) P(neither a TV nor a Refrigerator) = 1 - P( (neither a TV nor a Refrigerator)C )
= 1 - P(TV or Refrigerator) = 1-0.69 = 0.31
OR
P(neither a TV nor a Refrigerator) = 1 - P(TV only) - P(Refrigerator only) -
P(both) = 1-0.31-0.17-0.21=0.31
14. (a) P(US) = 1557/7690= 0.2025
(b) Probability that a person completed education before college? Do not include
those who did not answer.
4195
P(Some HS) + P(Primary) = 7690 + 1161
7690
= 0.6965.
(c) Probability that a person is from France or did post graduate study.
1539 379
P(France or Post Grad) = P(France) + P(Post Grad) - P(both) = 7690 + 7690

69
7690
= 0.2404.
(d) Probability that a person is from France and finished primary school.
309
P(France and Primary) = 7690 = 0.0402.
15. A chart may help solve this problem. The below chart shows the initial information
given to us:
Cat Dog Total
Male 6 8
Female
Total 18 24

We can then fill in the missing numbers:


Cat Dog Total
Male 6 8 14
Female 12 16 28
Total 18 24 42

Then we can answer the questions that we’re interested in.


P (M aleandCat) 6/42 1
(a) P(Male | Cat) = P (Cat)
= 18/42
= 3
= 0.3333
P (CatandF emale) 12/42
(b) P(Cat | Female) = P (F emale)
= 28/42
= 0.4286
P (F emaleandDog) 16/42
(c) P(Female | Dog) = P (Dog)
= 24/42
= 0.6667

16. 2 definitions for independence you could use:


ˆ P(A)P(B) =P(AandB)
ˆ P(A | B) = P(A)
Using each definition:
Def1:
24
 14  336
P(Dog)P(M ) = 42 42
= 1764 = 0.1905
8
P (Dog and M) = 42 = 0.1905
Def2:
8
P(Dog | M ) = 14 = 0.5714
P(Dog) = 24
42
= 0.5714
Since the above 2 equations are equal using either definition, then yes, they are inde-
pendent.
17. Before these questions are answered, set up a tree diagram. Note that the probability
of being detained depends on whether a “correct” decision has been made. Because of
this, detained and not detained will go on the second branch of the tree.
P (Drink and Detain) = (0.12)(0.8) = 0.096
ain
Det
0.8
No
tD
ink et
0.2 ain
Dr 2
0.1 P (Drink and Not Detain) = (0.12)(0.2) = 0.024
No
tD P(Not Drink and Detain)=(0.88)(0.2)=0.176
0.8 rink ain
8 Det
0.2
No
tD
et
0.8 ain
P(Not Drink and Not Detain)=(0.88)(0.8)=0.704

Here are the interpretations of the numbers in the tree diagram:


P(Drink) = 0.12
P(Not Drink) = 0.88
P(Detain | Drink) = 0.8
P(Not Detain | Drink) = 0.2
P(Detain | Not Drink) = 0.2
P(Not Detain | Not Drink) = 0.8
P(Drink and Detain) = 0.096
P(Drink and Not Detain) =0.024
P(Not Drink and Detain) =0.176
P(Not Drink and Not Detain) =0.704
To answer the questions:

(a) P(Detain | Not Drink) = 0.2.


(b) P(Detain) = P(Detain and Drink) + P(Detain and Not Drink) = 0.096+0.176 =
0.272.
P (DrinkandDetain) 0.096
(c) P(Drink | Detain) = P (detain)
= 0.272
= 0.353.
P (N otDetain|Drink)P (Drink)
(d) P(Drink | Not Detain) = P (N otDetain|Drink)P (Drink)+P (N otDetain|N otDrink)P (N otDrink)
=
(0.2)(0.12)
(0.2)(0.12)+(0.8)(0.88)
= 0.033.

18. Before we answer any questions, it may be useful to create a tree diagram.
t P (Day and Absent) = (0.6)(0.01) = 0.006
sen
Ab 1
0.0
No
tA
y bs
Da 0.9 ent
9
0.6 P (Day and Not Absent) = (0.6)(0.99) = 0.594

Ni P (Night and Absent) = (0.4)(0.02) = 0.008


gh t
t sen
0.4 Ab 2
0.0
No
tA
bs
0.9 ent
8
P (Night and Not Absent) = (0.4)(0.98) = 0.392

Question to answer: What percent of employees are absent on any given day? Need
to calculate P(Absent). This is the denominator of Bayes Rule.
P (Absent) = P (Absent | Day) P (Day) + P (Absent | Night) P (Night)
= (0.01)(0.6) + (0.02)(0.4)
= 0.014
= 1.4%.

19. (a) What is the value of the missing probability in the table above? The total proba-
bility must equal 1. Therefore, the missing value is then 1 − 0.2 − 0.1 − 0.3 − 0.3 =
0.1.

20. Given mean 16 and standard deviation 3.

(a) z = 9−16
3
= −7
3
= −2.33.
21−16
(b) z = 3
= 53 = 1.67.
(c) x = 9 is more unusual.
21. Use a normal model with a mean of 50 and standard deviation of 5.

(a) For the 68-95-99.7 Rule, we will have the following points on the graph (not shown
here). µ − 3σ = 50 − 3(5) = 35µ − 2σ = 50 − 2(5) = 40µ − σ = 50 − 5 = 45µ =
50µ + σ = 50 + 5 = 55µ + 2σ = 50 + 2(5) = 60µ + 3σ = 50 + 3(5) = 65
(b) Between 35 and 65.
(c) 50%.
(d) 95%.
(e) 47.5%.
(f) 47.5%.
(g) 34%.
(h) 49.85%.
(i) 2.5%.
(j) 16%.
(k) Between 40 and 50 is 47.5%. Between 50 and 65 is 49.85%. Between 40 and 65 is
97.35%.
(l) Between 45 and 50 is 34%. Between 50 and 60 is 47.5%. Between 45 and 60 is
81.5%.

22. Based on the Normal Model N(50, 5), answer the following questions. Draw pictures
to help you see what is going on.

(a) 50%.
(b) Step 1: Standardize. z = 62−50
5
= 125
= 2.4. Step 2: Calculate value from a
calculator or a table. From calculator: normalcdf(2.4, 999) = 0.0082. Solution
=0.82%. From table: We want area(z > 2.4). The table gives area(z < 2.4) =
0.9918. Our answer is 1 − 0.9918 = 0.0082, which is 0.82%.
(c) Step 1: Standardize. z = 39−505
= −11
5
= −2.2. Step 2: Calculate value from a
calculator or a table. From calculator: normalcdf(−999, −2.2) = 0.0139. Solution
= 1.39%. From table: We want area(z < −2.2). The table gives us this directly
and the value is 0.0139. Solution = 1.39%.
(d) Step 1: Standardize. z = 43−50
5
= −75
= −1.4. Step 2: Calculate value from a
calculator or a table. From calculator: normalcdf(−1.4, 999) = 0.9192. Solution
= 91.92$. From table: We want area(z > −1.4). The table gives area(z <
−1.4) = 0.0808. Our answer is 1 − 0.0808 = 0.9192, which is 91.92%.
(e) Step 1: Standardize. z = 58−50 5
= 85 = 1.6. Step 2: Calculate value from a
calculator or a table. From calculator: normalcdf(−999, 1.6) = 0.9452. Solution
= 94.52$. From table: We want area(z < 1.6). The table gives us this directly
and the value is 0.9452. Solution = 94.52%.
(f) Step 1: Standardize both values. z1 = 37−50 5
= −2.6. z2 = 52−50
5
= 0.4. Step 2:
Calculate value from a calculator or a table. From calculator: normalcdf(−2.6, 0.4) =
0.6508. Solution = 65.08%. From table: We want area(−2.6 < z < 0.4). The
table gives area(z < −2.6) = 0.0047 and area(z < 0.4) = 0.6554. Our answer is
0.6554 − 0.0047 = 0.6507, which is 65.07%. The difference between the calculator
answer and the table answer is because of rounding. Show your work!
(g) Step 1: Standardize both values. z1 = 57.25−50
5
= 1.45. z2 = 66−50
5
= 3.2. Step 2:
Calculate value from a calculator or a table. From calculator: normalcdf(1.45, 3.2) =
0.0728. Solution = 7.28%. From table: We want area(1.45 < z < 3.2). The ta-
ble gives area(z < 1.45) = 0.9265 and area(z < 3.2) = 0.9993. Our answer is
0.9993 − 0.9265 = 0.0728, which is 7.28%.

23. Based on the Normal Model N(50, 5), answer the following questions. Draw pictures
to help you see what is going on.

(a) Highest 5% of values corresponds to a z-value of 1.645. Find this using invnorm(0.95)
on your calculator, or looking for the value on a table. Use the z-score formula to
solve for the value you are looking for. z = 1.645 = x−50
5
⇒ 8.225 = x − 50
⇒ x = 58.225.
(b) Lower 25% of values corresponds to a z-value of -0.67. Find this using invnorm(0.25)
on your calculator, or looking for the value on a table. Use the z-score formula to
solve for the value you are looking for. z = −0.67 = x−505
⇒ −3.35 = x − 50
⇒ x = 46.65.
(c) We need 2 values for z in this case. Let the lower value of z be zL and the upper
value of z be zR . Find these values by typing invnorm(0.15) and invnorm(0.85)
on your calculator. We have zL = −1.04. zR = 1.04 respectively. Solve for the
two values of x. zL = −1.04 = x−505
⇒ −5.2 = x − 50
⇒ xL = 44.8,
zR = 1.04 = x−50
5
⇒ 5.2 = x − 50
⇒ xR = 55.2.
Stat 130 Exam 3:
1
Important Formulas and Concepts

1 Chapter 12
1.1 Definitions
1. Binomial Distribution
A sequence of trials has a binomial distribution if

ˆ There are exactly 2 possible outcomes (success and failure)


ˆ Probability of success, p, is constant
ˆ Trials are independent
ˆ There are a fixed number of trials, n.

2. Success/Failure Condition
A Binomial Model is approximately Normal if we expect at least 10 successes and 10
failures, i.e. np ≥ 10 and n(1 − p) ≥ 10.

3. Poisson Distribution
A variable has a Poisson distribution if

ˆ Counting the number of occurrences in an interval


ˆ Occurrences are independent
ˆ The probability of an occurrence is the same over all possible intervals of the same
size.

1.2 Binomial Model:


ˆ P (X = k) = n
 k
k
p (1 − p)n−k

ˆ µ = np

ˆ σ=
p
np(1 − p)

where

ˆ n n!

k
= k!(n−k)!

ˆ n! = n(n − 1)(n − 2) · · · (1)


1
This version: November 11, 2021, by Dale Embers. May not include all things that could possibly be
tested on. To be used as an additional reference to studying.
1.3 Poisson Model:
e−µ µk
ˆ P (X = k) =
k!

ˆ µ=µ


ˆ σ= µ

2 Ch 13 Sampling Distributions and Confidence Inter-


vals
2.1 Definitions
1. Sampling Distribution
Different random samples give different values of a statistic. Distribution of the statis-
tics over all possible samples is called the sampling distribution. Sampling distribution
model shows the behavior of the statistic over all the possible samples for the same
size n.

2. Sampling Distribution Model


Because we can never see all possible samples, we often use a model as a practical way
of describing the theoretical sampling distribution.

3. Sampling Distribution Model for a Proportion


If assumptions of independence and random sampling are met, and we expect at least
10 successes and 10 failures, then the sampling distribution of a proportion is modeled
by a normal model p with a mean equal to the true proportion value p and has a standard
deviation equal to p(1 − p)/n.
 q 
p(1−p)
p̂ ∼ N p, n

4. Sampling Error
Sample-to-sample variation

5. Central Limit Theorem (CLT)


The sampling distribution model of the sample mean (and proportion) is approximately
Normal for large n, regardless of the distribution of the population as long as the
observations are independent. The larger the sample, the better the approximation
will be.

6. Sampling Distribution Model for a Mean


If assumptions of independence and random sampling are met, and the sample size is
large enough, the sampling distribution of the sample mean is modeled by a normal
model√with a mean equal to the population mean and has a standard deviation equal
to σ/ n.
 
σ
X ∼ N µ, n √

7. Confidence Interval (CI)


A level C confidence interval for a model parameter is an interval of values usually of
the form Estimate ± Margin of Error found from data in such a way that C% of all
random samples will yield intervals that capture the true parameter value.

8. Margin of Error (MOE)


In a confidence interval, the extent of the interval on either side of the observed statistic
value. It is typically the produce of a critical value from the sampling distribution and
a standard error from the data. A small MOE corresponds to a confidence interval that
pins down the parameter precisely. A large MOE corresponds to a confidence interval
that gives relatively little information about the estimated parameter.

9. Critical Value
The number of standard errors to move away from the mean of the sampling distribu-
tion to correspond to the specified level of confidence. The critical value, for a normal
sampling distribution, denoted z ∗ , is usually found from a table or technology. The
critical value, for a t-distribution, denoted t∗ , is also found from a table or technology.

10. Some z ∗ values (Critical Values) for Confidence Intervals

CI: 90% CI 95% CI 98% CI 99% CI


z 1.645 1.96 2.326 2.576

3 Chapter 14 Confidence Intervals


1. One-sample z-interval for the mean
This is the confidence interval for the mean. This is given by x ± z ∗ SD(X), SD(X) =
σ
√ . The critical value z ∗ depends on the particular confidence level that you specify.
n
σ
2. The margin of error is z ∗ √ .
n
σ
3. Given a desired margin of error m, solve m = z ∗ √ for n to get the desired sample
n
z∗σ 2
size. This will result in n = ( ).
m

4 Chapter 14 Significance Tests


1. Hypothesis
A model or proposition that we adopt in order to test.
2. Null Hypothesis (H0 )
The claim being assessed in a hypothesis test that states “no change from the tradi-
tional value,” “no effect”, “no difference”, or “no relationship”. For a claim to be a
testable null hypothesis, it must specify a value for some population parameter that
can form the basis for assuming a sampling distribution for a test statistic.

3. Alternative Hypothesis (HA )


The alternative hypothesis proposes what we should conclude if we reject the null
hypothesis.

4. P-value
The probability of observing a value for a test statistic at least as far from the hy-
pothesized value as the statistic value actually observed if the null hypothesis is true.
A small p-value indicates either that the observation is improbable or that the proba-
bility calculation was based on incorrect assumptions. The assumed truth of the null
hypothesis is the assumption under suspicion.

5. One-sample z-test for the mean


This is the hypothesis test. It tests the hypothesis H0 : µ = µ0 using the statistic
z = (x − µ0 )/( √σn ).

6. One-sided (Tailed) Alternative


An alternative hypothesis is one-sized ( for example HA : µ > µ0 or HA : µ < µ0 )
when we are interested in deviations in only one direction away from the hypothesized
parameter value.

7. Two-sided (Tailed) Alternative


An alternative hypothesis is two-sided ( for example HA : µ ̸= µ0 ) when we are
interested in deviations in either direction away from the hypothesized parameter value.

8. One rejects H0 and accepts Ha and calls the results statistically significant if the P -
value is sufficiently small (less than α).

5 Chapter 15
1. Statistically significant
When the p-value falls below the alpha level, we say that the test is “statistically
significant” at that alpha level.

2. Alpha level
The threshold p-value that determines when we reject a null hypothesis. If we observe
a statistic whose p-value based on the null hypothesis is less than α, we reject that
null hypothesis.

3. Significance level
The alpha level is also called the significance level, most often in a phrase such as a
conclusion that a particular test is “significant at the 5% significance level”
4. Type I Error
The error of rejecting a null hypothesis when in fact it is true (also called a false
positive). The probability of a Type I Error is α.

5. Type II Error
The error of failing to reject a null hypothesis when in fact it is false (also called a false
negative). The probability of a Type II Error is β.

6. β
The probability of a Type II Error is commonly denoted β and depends on the effect
size.

7. Power
The probability that a hypothesis test will correctly reject a false null hypothesis is the
power of the test. For any specific value in the alternative, the power is 1 − β.

6 Chapter 17
1. Student’s t distribution
A family of distributions indexed by its degrees of freedom. The t-models are unimodal,
symmetric, and bell shaped, but have fatter tails and a narrower center than the
Normal model. As the degrees of freedom increase, t-distributions approah the Normal
distribution.

2. Degrees of Freedom for Student’s t distribution (df)


For the t-distribution, the degrees of freedom are equal to n − 1, where n is the sample
size.

3. One-sample t-interval for the mean


This
√ is the confidence interval for the mean. This is given by y ± t∗n−1 SE(y), SE(y) =

s/ n. The critical value tn−1 depends on the particular confidence level that you
specify and on the number of degrees of freedom n − 1.

4. One-sample t-test for the mean


This is the hypothesis test. It tests the hypothesis H0 : µ = µ0 using the statistic
t = (x − µ0 )/( √sn ), which has a t-distribution with n − 1 degrees of freedom.

5. One-sided (Tailed) Alternative


An alternative hypothesis is one-sized ( for example HA : µ > µ0 or HA : µ < µ0 )
when we are interested in deviations in only one direction away from the hypothesized
parameter value.

6. Two-sided (Tailed) Alternative


An alternative hypothesis is two-sided ( for example HA : µ ̸= µ0 ) when we are
interested in deviations in either direction away from the hypothesized parameter value.
7 Confidence Interval Creation and Hypothesis Test-
ing Summary
7.1 One-Sample Mean Sigma Known
ˆ Confidence Interval Creation
σ
CI: x ± z ∗ √
n
| {z }
M OE
z ∗ = criticalvalue
Table of critical values for z ∗ for Confidence Intervals:

CI: 90% 95% 96% 98% 99%


z∗ 1.645 1.96 2.054 2.326 2.576

ˆ Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µ = µ0
HA : µ <or>or̸= µ0
Step 2: Calculate your test statistic
x−µ0
z= √σ
n

Step 3: Calculate the p-value


Step 4: State your conclusion. If p-value < α, (usually α = 0.05), then Reject H0 . If
p-value is ≥ α, then Do Not Reject H0 .

7.2 One-Sample Mean


The degrees of freedom is given by df = n − 1.

ˆ Confidence Interval Creation


s
CI : x ± t∗n−1 √
n
| {z }
M OE
t∗n−1 = criticalvalue
Use Appendix D Table T to determine the critical values of t.

ˆ Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µ = µ0
HA : µ <or>or̸= µ0
Step 2: Calculate your test statistic
x−µ0
tn−1 = √s
n
Step 3: Calculate the p-value
Step 4: State your conclusion. If p-value < α, (usually α = 0.05), then Reject H0 . If
p-value is ≥ α, then Do Not Reject H0 .

7.3 Paired Differences of Means


The degrees of freedom is given by df = n − 1.

ˆ Confidence Interval Creation (n = number of pairs)


sd
CI: d¯ ± t∗n−1 √
n
| {z }
M OE
t∗n−1 = criticalvalue

ˆ Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µd = 0
HA : µd <or>or̸= ∆0
Step 2: Calculate your test statistic
¯
tn−1 = sdd

n
¯
d = averageofthedifferences
sd = standarddeviationofthedifferences
Step 3: Calculate the p-value
Step 4: State your conclusion. If p-value ≤ α, (usually α = 0.05), then Reject H0 . If
p-value is > α, then Do Not Reject H0 .

8 Extra Information
Review any and all notes and supplementary materials. It may be the case that something
was accidentally omitted from this study guide. Also, review any problems that may have
been discussed in class as not all example problems may have been provided here.
9 Example Problems
1. A printing company ships boxes of paper to office stores. In each box, there are 30
reams of paper. However, in every box, they estimate that 2% of the reams of paper
are defective in some way. What is the probability that in a box, there will be exactly
4 reams of paper that need to be shipped back to the printing company? What is the
mean number of reams of paper that need to be shipped back? What is the standard
deviation?

2. Several factors are involved in the creation of a confidence interval. Among them are
the sample size, the level of confidence, and the margin of error. Which statements are
true?

(a) For a given sample size, higher confidence means a smaller margin of error.
(b) For a given confidence level, halving the margin of error requires a sample twice
as large.
(c) For a certain confidence level, you can get a smaller margin of error by selecting
a bigger sample.
(d) For a fixed margin of error, larger samples provide greater confidence.

3. A butcher wants to estimate the mean weight of a ham. She samples 33 hams and
computes a sample mean weight of 8.2 pounds. She knows the population standard
deviation is 3.3 pounds. What is a 90% confidence interval for the population mean
weight of ham? Please indicate the value you used for z ∗ or t∗ .

4. A professor is interested in the mean length of a letter of recommendation. He samples


51 letters and finds a sample mean length of 620 words with a sample standard deviation
of 90 words. What is a 95% confidence interval for the population mean length of a
letter? Please indicate the value you used for z ∗ or t∗ .

5. A computer professional wants to know the mean number of emails people receive each
day. She is going to compute a 95% confidence interval and wants a margin of error of
±2 emails. She believes the standard deviation to be 18 emails. How large should the
sample size be to ensure this margin of error?

6. A researcher believes that the mean age at which a person first votes is greater than 22
years. He samples 27 people and computes a sample mean of 24.3 years and a sample
standard deviation of 8 years.

(a) State the hypotheses to be tested.


(b) What is the value of your test statistic (t or z value)?
(c) What is the P-value?
(d) What conclusion should be drawn (compare p-value to 0.05).
7. A researcher believes that the mean age at which a person first tries chocolate is less
than 3 years. He samples 24 people and computes a sample mean of 2.3 years and a
sample standard deviation of 1.5 years.

(a) State the hypotheses to be tested.


(b) What is the value of your test statistic (t or z value)?
(c) What is the P-value?
(d) What conclusion should be drawn (compare p-value to 0.05).

8. A researcher believes that the mean height of a prairie dog is different than 14 inches.
She samples 31 prairie dogs and computes a sample mean of 15.8 inches and a sample
standard deviation of 3.6 inches.

(a) State the hypotheses to be tested.


(b) What is the value of your test statistic (t or z value)?
(c) What is the P-value?
(d) What conclusion should be drawn (compare p-value to 0.05).

9. Which of the following are true? If false, explain briefly.

(a) A very low P-value provides evidence against the null hypothesis.
(b) A high P-value is strong evidence in favor of the null hypothesis.
(c) A P-value above 0.10 shows that the null hypothesis is true.
(d) If the null hypothesis is true, you can’t get a p-value below 0.01.

10. Which of the following statements are true? If false, explain briefly.

(a) Using an alpha level of 0.05, a p-value of 0.04 results in rejecting the null hypoth-
esis.
(b) The alpha level depends on the sample size.
(c) With an alpha level of 0.01, a p-value of 0.10 results in rejecting the null hypoth-
esis.
(d) Using an alpha level of 0.05, a p-value of 0.06 means the null hypothesis is true.

11. For each of the following situations, state whether a Type I or Type II, or neither error
has been made. Explain briefly.

(a) A bank wants to know if the enrollment on their website is above 30% based on
a small sample of customers. they test H0 : p = 0.3 versus HA : p > 0.3 and
reject the null hypothesis. Later they find out that actually 28% of all customers
enrolled.
(b) A student tests 100 students to determine whether other students on her campus
prefer Coke or Pepsi and finds no evidence that preference for Coke is not 0.5.
Later, a marketing company tests all students on campus and finds no difference.
(c) A human resource analyst wants to know if the applicants this year score, on
average, higher on their placement exam than the 52.5 points the candidates
averaged last year. She samples 50 recent tests and finds the average to be 54.1
points. She fails to reject the null hypothesis that the mean is 52.5 points. At
the end of the year, they find that the candidates this year had a mean of 55.3
points.
(d) A pharmaceutical company tests whether a drug lifts the headache relief rate
from the 25% achieved by the placebo. They fail to reject the null hypothesis
because the p-value is 0.465. Further testing shows that the drug actually relieves
headaches in 38% of people.
12. We want to estimate the healing rate for a wound. A sample of size 17 is collected
and the sample mean is computed to be 24.3 micrometers per hour, with a sample
standard deviation of s= 8 micrometers per hour. What is a 95% confidence interval
for the population mean?

13. Teresa knows that appointment times are approximately normally distributed. She
believes the mean wait time is longer than 25 minutes. She conducts a test with α
= 0.05 and the appropriate hypotheses. She selects 25 random appointments and the
sample mean was found to be 25.66 minutes and a sample standard deviation of 10
minutes.

(a) State the hypotheses to be tested.


(b) Compute the value of the test score.
(c) Give the P-value or range of P-values.
(d) Do the results seem significant?
10 Example Solutions
1. This is an example of a Binomial Model problem. We are given that p = 0.02, n = 30.
We define “success” to be that a ream of paper that needs to be shipped back to the
printing company. The probability that there will be exactly 4 reams of paper that
need to be shipped back is
30
P (X = 4) = 4 (0.02) (1 − 0.02)30−4
4

= 27405(0.024 )(0.9826 )
= 27405(1.6 × 10−7 )(0.5914)
= 0.0026
You can also calculate this probability on your calculator as binompdf (30, 0.02, 4) =
0.0026.
p
The
p mean is µ = np = 30(0.02) = 0.6. The standard deviation is σ = np(1 − p) =
30(0.02)(0.98) = 0.7668.

2. (a) False. Higher confidence means a larger margin of error.


(b) False. The margin of error decreases as the square root of the sample size increases.
Halving the margin of error requires a sample four times as large as the original.
(c) True. Larger samples are less variable, which translates to a smaller margin of
error. We can be more precise at the same level of confidence.
(d) True. Larger samples are less variable, which makes us more confident that a
given confidence interval succeeds in catching the population proportion.

3. A butcher wants to estimate the mean weight of a ham. She samples 33 hams and
computes a sample mean weight of 8.2 pounds. She knows the population standard
deviation is 3.3 pounds. What is a 90% confidence interval for the population mean
weight of ham? Please indicate the value you used for z ∗ or t∗ .
Summary of what is given:
n = 33
x = 8.2
σ = 3.3.
For confidence intervals for the mean, we use z ∗ and 90% confidence (for this case).
Thus, z ∗ = 1.645.
CI : x ± z ∗ √σn
⇒ 8.2 ± 1.645 √3.3
33
⇒ (7.255, 9.145)

4. A professor is interested in the mean length of a letter of recommendation. He samples


51 letters and finds a sample mean length of 620 words with a sample standard deviation
of 90 words. What is a 95% confidence interval for the population mean length of a
letter? Please indicate the value you used for z ∗ or t∗ .
Summary of what is given:
n = 51
y = 620
s = 90.
For confidence intervals for the mean, we use t∗ , with n − 1 degrees of freedom and
95% confidence (for this case). Thus, t∗50 = 2.009.
CI : y ± t∗n−1 √sn
⇒ 620 ± 2.009 √9051
⇒ (594.682, 645.318)

5. A computer professional wants to know the mean number of emails people receive each
day. She is going to compute a 95% confidence interval and wants a margin of error of
±2 emails. She believes the standard deviation to be 18 emails. How large should the
sample size be to ensure this margin of error? Summary of what is given:
M OE = 2
s = 18
For sample size calculation, since this is based on the mean, use t∗ , with n − 1 degrees
of freedom. Note that as n becomes really large, the t-distribution becomes more like
the normal distribution. Therefore, use the 95% confidence interval critical value from
the normal distribution instead. z ∗ = 1.96. Sample size can be calculated as follows:
M OE = t∗n−1 √sn ⇒ 2 = 1.96 √18n
2
⇒ 1.96×18 = √1n

⇒ n = 1.96×182 
2
⇒ n = 1.96×18
2
⇒ n = 311.1696
⇒ n ≈ 312

6. A researcher believes that the mean age at which a person first votes is greater than 22
years. He samples 27 people and computes a sample mean of 24.3 years and a sample
standard deviation of 8 years.

(a) State the hypotheses to be tested.


H0 : µ = 22
HA : µ > 22
(b) What is the value of your test statistic (t or z value)?
Use the t-test statistic because we are dealing with means.
tn−1 = y−µ
√s
0
n
24.3−22 2.3
t27−1 = t26 = √8
= 1.54
= 1.49
27

(c) What is the P-value?


On your calculator: tcdf (1.49, 999, 26) = 0.0741
On the table: Go to degrees of freedom 26, find where 1.49 is in the row, and then
look at the one-tail probability values. The probability is between 0.05 and 0.10.
(d) What conclusion should be drawn (compare p-value to 0.05).
Since the p-value is “large” (0.0741 > 0.05), Do Not Reject H0 . The results are
not significant.
7. A researcher believes that the mean age at which a person first tries chocolate is less
than 3 years. He samples 24 people and computes a sample mean of 2.3 years and a
sample standard deviation of 1.5 years.

(a) State the hypotheses to be tested.


H0 : µ = 3
HA : µ < 3
(b) What is the value of your test statistic (t or z value)?
Use the t-test statistic because we are dealing with means.
tn−1 = y−µ
√s
0
n
2.3−3 −0.7
t24−1 = t23 = 1.5

= 0.3062
= −2.286
24

(c) What is the P-value?


On your calculator: tcdf (−999, −2.286, 23) = 0.0159.
On the table: Go to degrees of freedom 23, find where 2.286 is in the row, and
then look at the one-tail probability values. The probability is between 0.01 and
0.025.
(d) What conclusion should be drawn (compare p-value to 0.05).
Since the p-value is “small” (0.0159 < 0.05), Reject H0 . The results are significant.

8. A researcher believes that the mean height of a prairie dog is different than 14 inches.
She samples 31 prairie dogs and computes a sample mean of 15.8 inches and a sample
standard deviation of 3.6 inches.

(a) State the hypotheses to be tested.


H0 : µ = 14
HA : µ ̸= 14
(b) What is the value of your test statistic (t or z value)? Use the t-test statistic
because we are dealing with means.
tn−1 = y−µ
√s
0
n
15.8−14 1.8
t31−1 = t30 = 3.6

= 0.6466
= 2.784
31

(c) What is the P-value?


On your calculator: 2tcdf (2.784, 999, 30) = 2(0.0046) = 0.0092.
On the table: go to degrees of freedom 30, find where 2.784 is in the row, and
then look at the two-tail probability values. The probability is lower than 0.01.
(d) What conclusion should be drawn (compare p-value to 0.05).
Since the p-value is “small” (0.0092 < 0.05), Reject H0 . The results are significant.

9. Which of the following are true? If false, explain briefly.

(a) A very low P-value provides evidence against the null hypothesis.
True.
(b) A high P-value is strong evidence in favor of the null hypothesis.
False. A high p-value shows that the data are consistent with the null hypothesis
but does not prove that the null hypothesis is true.
(c) A P-value above 0.10 shows that the null hypothesis is true.
False. No p-value ever shows that the null hypothesis is true (or false).
(d) If the null hypothesis is true, you can’t get a p-value below 0.01.
False. If the null hypothesis is true, you will get a p-value below 0.01 about once
in a hundred hypothesis tests.
10. Which of the following statements are true? If false, explain briefly.
(a) Using an alpha level of 0.05, a p-value of 0.04 results in rejecting the null hypoth-
esis.
True.
(b) The alpha level depends on the sample size.
False. The alpha level is set independently and does not depend on the sample
size.
(c) With an alpha level of 0.01, a p-value of 0.10 results in rejecting the null hypoth-
esis.
False. The p-value would have to be less than 0.01 to reject the null hypothesis.
(d) Using an alpha level of 0.05, a p-value of 0.06 means the null hypothesis is true.
False. It means that we do not have enough evidence at that alpha level to reject
the null hypothesis.
11. For each of the following situations, state whether a Type I or Type II, or neither error
has been made. Explain briefly.
(a) A bank wants to know if the enrollment on their website is above 30% based on
a small sample of customers. they test H0 : p = 0.3 versus HA : p > 0.3 and
reject the null hypothesis. Later they find out that actually 28% of all customers
enrolled.
Type I Error. The actual value is not greater than 0.3, but they rejected the null
hypothesis.
(b) A student tests 100 students to determine whether other students on her campus
prefer Coke or Pepsi and finds no evidence that preference for Coke is not 0.5.
Later, a marketing company tests all students on campus and finds no difference.
No error. The actual value is 0.5 which was not rejected.
(c) A human resource analyst wants to know if the applicants this year score, on
average, higher on their placement exam than the 52.5 points the candidates
averaged last year. She samples 50 recent tests and finds the average to be 54.1
points. She fails to reject the null hypothesis that the mean is 52.5 points. At
the end of the year, they find that the candidates this year had a mean of 55.3
points.
Type II Error. The actual value was 55.3 points, which is greater than 52.5, which
was not rejected.
(d) A pharmaceutical company tests whether a drug lifts the headache relief rate
from the 25% achieved by the placebo. They fail to reject the null hypothesis
because the p-value is 0.465. Further testing shows that the drug actually relieves
headaches in 38% of people.
Type II Error. The null hypothesis was not rejected, but it was false. The true
relief rate was greater than 0.25.

12. We want to estimate the healing rate for a wound. A sample of size 17 is collected
and the sample mean is computed to be 24.3 micrometers per hour, with a sample
standard deviation of s= 8 micrometers per hour. What is a 95% confidence interval
for the population mean?
What we are given:
y = 24.3
s=8
n = 17
Because we want the confidence interval for the population mean, use the formula
CI : y ± t∗n−1 √sn
⇒ 24.3 ± t∗17−1 √817
⇒ 24.3 ± 2.120 √817
⇒ (20.187, 28.413)

13. Teresa knows that appointment times are approximately normally distributed. She
believes the mean wait time is longer than 25 minutes. She conducts a test with α
= 0.05 and the appropriate hypotheses. She selects 25 random appointments and the
sample mean was found to be 25.66 minutes and a sample standard deviation of 10
minutes.

(a) State the hypotheses to be tested.


H0 : µ = 25
HA : µ > 25
(b) Compute the value of the test score.
Since this is asking for us to test the population mean, we need the formula
tn−1 = y−µ
√s
0
n
25.66−25 0.66
⇒ t25−1 = t24 = 10

= 2
= 0.33
25

(c) Give the P-value or range of P-values.


On your calculator: tcdf (0.33, 999, 24) = 0.372.
On the table: go to degrees of freedom 24, find where 0.33 is on the row, and then
look at the one-tail probability values. The probability is greater than 0.10.
(d) Do the results seem significant?
Since the p-value is “large” (0.372 > α = 0.05), Do Not Reject H0 . The results
are not significant.
Stat 130 Final Exam: Use this in
conjunction with the other three
guides
1
Important Formulas and Concepts

1 Sampling Distributions
1.1 Definitions
1. Sampling Distribution
Different random samples give different values of a statistic. Distribution of the statis-
tics over all possible samples is called the sampling distribution. Sampling distribution
model shows the behavior of the statistic over all the possible samples for the same
size n.

2. Sampling Distribution Model


Because we can never see all possible samples, we often use a model as a practical way
of describing the theoretical sampling distribution.

3. Sampling Distribution Model for a Mean


If assumptions of independence and random sampling are met, and the sample size is
large enough, the sampling distribution of the sample mean is modeled by a normal
model√with a mean equal to the population mean and has a standard deviation equal
to σ/ n.
 
X ∼ N µ, √σn

4. Sampling Distribution Model for a Proportion


If assumptions of independence and random sampling are met, and we expect at least
10 successes and 10 failures, then the sampling distribution of a proportion is modeled
by a normal model p with a mean equal to the true proportion value p and has a standard
deviation equal to p(1 − p)/n.
 q 
p(1−p)
p̂ ∼ N p, n

5. Sampling Error
Sample-to-sample variation

6. Central Limit Theorem (CLT)


The sampling distribution model of the sample mean (and proportion) is approximately
1
This version: November 28, 2021, by Dale Embers. May not include all things that could possibly
be tested on. To be used as an additional reference to studying. Most definitions, formulas, and selected
problems come from Intro Stats by De Veaux, Velleman and Bock, 5th edition, published by Pearson.
Normal for large n, regardless of the distribution of the population as long as the
observations are independent. The larger the sample, the better the approximation
will be.

2 Chapter 18
1. Two-sample t-interval for the difference between means
A confidence interval for the difference between the means of two p independent groups
is found as (y 1 − y 2 ) ± t∗df × SE(y 1 − y 2 ). Here, SE(y 1 − y 2 ) = (s21 /n1 ) + (s22 /n2 ),
and the number of degrees of freedom is given by a special formula or we use the
conservative method.

2. Two-sample t-test for the difference between means


A hypothesis test for the difference between the means of two independent groups. It
tests the null hypothesis H0 : µ1 − µ2 = ∆0 , where the hypothesized difference ∆0 is
1 −y 2 )−∆0
almost always 0. This uses the statistic tdf = (ySE(y , with the degrees of freedom
1 −y 2 )
is given by a special formula or we use the conservative method.

3 Chapter 19
1. Hypothesis
A model or proposition that we adopt in order to test.

2. Null Hypothesis (H0 )


The claim being assessed in a hypothesis test that states “no change from the tradi-
tional value,” “no effect”, “no difference”, or “no relationship”. For a claim to be a
testable null hypothesis, it must specify a value for some population parameter that
can form the basis for assuming a sampling distribution for a test statistic.

3. Alternative Hypothesis (HA )


The alternative hypothesis proposes what we should conclude if we reject the null
hypothesis.

4. P-value
The probability of observing a value for a test statistic at least as far from the hy-
pothesized value as the statistic value actually observed if the null hypothesis is true.
A small p-value indicates either that the observation is improbable or that the proba-
bility calculation was based on incorrect assumptions. The assumed truth of the null
hypothesis is the assumption under suspicion.

5. One-proportion Z-test
A test of the null hypothesis that the proportion of a single sample equals a specified
value H0 : p = p0 by referring the statistic z = (p̂ − p0 )/SD(p̂).
6. Two-sided (Tailed) Alternative
An alternative hypothesis is two-sided ( for example HA : p ̸= p0 ) when we are inter-
ested in deviations in either direction away from the hypothesized parameter value.

7. One-sided (Tailed) Alternative


An alternative hypothesis is one-sized ( for example HA : p > p0 or HA : p < p0 )
when we are interested in deviations in only one direction away from the hypothesized
parameter value.

4 Chapter 20
1. Sampling distribution of the difference between two proportions
The sampling distribution of p̂1 − p̂2 is, under appropriate assumptions, modeled by
a Normal model with mean µ = p1 − p2 and standard deviation SD(p̂1 − p̂2 ) =
p
(p1 (1 − p1 ))/n1 + (p2 (1 − p2 ))/n2 .

2. Two-proportion z-interval
This is the confidence interval. A two-proportion z-interval gives a confidence interval
for the true difference in proportions, p1 −p2 in two independent groups. The confidence
interval is (p̂1 − p̂2 )±z ∗ ×SE(p̂1 − p̂2 ). z ∗ is the critical value from the standard Normal
Model corresponding to the specified confidence level.

3. Two-proportion z-test
This is the hypothesis test. Test the null hypothesis H0 : p1 − p2 = 0 by comparing
the statistic z = (p̂1 − p̂2 )/SEpooled (p̂1 − p̂2 ) to the standard normal model.

5 Confidence Interval Creation and Hypothesis Test-


ing Summary
5.1 1-Proportion
Proportion - always use p

ˆ Confidence Interval Creation


r
p̂(1 − p̂)
CI: p̂ ± z ∗
| {z n }
M OE
z ∗ = criticalvalue
Table of critical values for z ∗ for Confidence Intervals:

CI: 90% 95% 96% 98% 99%


z∗ 1.645 1.96 2.054 2.326 2.576
ˆ Hypothesis Testing
Step 1: Write down your hypothesis
H0 : p = p0
HA : p <or>or̸= p0
Step 2: Calculate your test statistic
z= q p̂−p0
p0 (1−p0 )
n

Step 3: Calculate the p-value


Step 4: State your conclusion. If p-value ≤ α, (usually α = 0.05), then Reject H0 . If
p-value is > α, then Do Not Reject H0 .

5.2 1-Sample Mean


Sample Mean - always use x or y
The degrees of freedom is given by df = n − 1.
ˆ Confidence Interval Creation
s
CI : y ± t∗n−1 √
n
| {z }
M OE
t∗n−1 = criticalvalue
Use Appendix D Table T to determine the critical values of t.
ˆ Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µ = µ0
HA : µ <or>or̸= µ0
Step 2: Calculate your test statistic
y−µ0
tn−1 = √s
n

Step 3: Calculate the p-value


Step 4: State your conclusion. If p-value ≤ α, (usually α = 0.05), then Reject H0 . If
p-value is > α, then Do Not Reject H0 .

5.3 Difference of Proportions


Difference of Proportions - always use p1 − p2
ˆ Confidence Interval Creation
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
CI:(p̂1 − p̂2 ) ± z ∗ +
n1 n2
| {z }
M OE
z ∗ = criticalvalue
See Section 5.1 for examples of the critical values for z ∗ .
ˆ Hypothesis Testing
Step 1: Write down your hypothesis
H0 : p1 − p2 = 0
HA : p1 − p2 <or>or̸= 0
Step 2: Calculate your test statistic
p̂1 −p̂2
z= SEpooled (p̂1 −p̂2 )
,
q
p̂pooled (1−p̂pooled ) p̂ (1−p̂ )
SEpooled (p̂1 − p̂2 ) = n1
+ pooled n2 pooled ,
NumberofSuccessesinGroup1+NumberofSuccessesinGroup2
p̂pooled = n1 +n2
.
Step 3: Calculate the p-value
Step 4: State your conclusion. If p-value ≤ α, (usually α = 0.05), then Reject H0 . If
p-value is > α, then Do Not Reject H0 .

5.4 Difference of Means - 2 Independent Groups


Difference of Means - always use x1 − x2 or y 1 − y 2
The degrees offreedom is given by (round down to the nearest integer)
s2 2 2

1 + s2
n1 n2
df =  2 2
s1
 2 2
s2
. or use conservative method.
1 1
n1 −1 n1
+ n2 −1 n2

ˆ Confidence Interval Creation


s
s21 s2
CI: (y 1 − y 2 ) ± t∗df + 2
n1 n2
| {z }
M OE
t∗df = criticalvalue

ˆ Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µ1 − µ2 = ∆0
HA : µ1 − µ2 <or>or̸= ∆0
Step 2: Calculate your test statistic
(y 1 −y 2 )−∆0
tdf = r
s2 2
1 + s2
n1 n2

Step 3: Calculate the p-value


Step 4: State your conclusion. If p-value ≤ α, (usually α = 0.05), then Reject H0 . If
p-value is > α, then Do Not Reject H0 .

5.5 Paired Differences of Means


The degrees of freedom is given by df = n − 1.
ˆ Confidence Interval Creation (n = number of pairs)
sd
CI: d¯ ± t∗n−1 √
n
| {z }
M OE
t∗n−1 = criticalvalue

ˆ Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µd = ∆0
HA : µd <or>or̸= ∆0
Step 2: Calculate your test statistic
¯
tn−1 = d−∆
s
√d
0

d¯ = averageofthedifferences
sd = standarddeviationofthedifferences
Step 3: Calculate the p-value
Step 4: State your conclusion. If p-value ≤ α, (usually α = 0.05), then Reject H0 . If
p-value is > α, then Do Not Reject H0 .

6 Extra Information
Review any and all notes and supplementary materials. It may be the case that something
was accidentally omitted from this study guide. Also, review any problems that may have
been discussed in class as not all example problems may have been provided here.
7 Example Problems
1. The 95% confidence interval for the number of teens who reported that they had
misrepresented their age online is from 45.6% to 52.5%. There were 799 teens in this
study.

(a) Interpret the interval in this context.


(b) Explain the meaning of “95% confident” in this context.

2. A study found that 16 of 40 peanut candy bars in fact did not contain peanuts.

(a) Construct a 90% confidence interval.


(b) Interpret your 90% confidence interval.
(c) Construct a 95% confidence interval.
(d) Interpret your 95% confidence interval.

3. Several factors are involved in the creation of a confidence interval. Among them are
the sample size, the level of confidence, and the margin of error. Which statements are
true?

(a) For a given sample size, higher confidence means a smaller margin of error.
(b) For a given confidence level, halving the margin of error requires a sample twice
as large.
(c) For a certain confidence level, you can get a smaller margin of error by selecting
a bigger sample.
(d) For a fixed margin of error, larger samples provide greater confidence.

4. I sample 600 people and 432 of them like cats. Construct a 95% confidence interval
for the population proportion.

5. I think the proportion of people that eat candy is around 0.75. I am going to construct
a 90% confidence interval and want the margin of error to be ±0.025. How large should
the sample size be?

6. Jimmy samples 930 people and 234 took public transportation. Construct a 99%
confidence interval for the population proportion.

7. I am going to construct a 95% confidence interval for the proportion of people that
wear eyeglasses and want the margin of error to be ±0.2. I have no idea what to
estimate for the population proportion. How large should the sample size be?

8. A researcher believes that more than 50% of all people voted in the last election. She
samples 800 people and 420 of them voted. Test her claim at a significance level of
0.05 (i.e. compare the P-value to 0.05).

(a) State the hypotheses to be tested.


(b) Compute the test statistics (z-value). You must show your computation to receive
credit.
(c) Compute the P-value associated with your test statistic.
(d) Make a conclusion about the hypotheses.

9. A researcher believes that fewer than 75% of all mollusks are tasty. He samples 1200
mollusks and 865 of them are tasty. Test his claim at a significance level of 0.05 (i.e.
compare the P-value to 0.05).

(a) State the hypotheses to be tested.


(b) Compute the test statistics (z-value). You must show your computation to receive
credit.
(c) Compute the P-value associated with your test statistic.
(d) Make a conclusion about the hypotheses.
10. A researcher believes that the percentage of people that watch Game of Thrones is
different than 27%. He samples 900 people and 220 of them watch. Test his claim at
a significance level of 0.05 (i.e. compare the P-value to 0.05).
(a) State the hypotheses to be tested.
(b) Compute the test statistics (z-value). You must show your computation to receive
credit.
(c) Compute the P-value associated with your test statistic.
(d) Make a conclusion about the hypotheses.
11. A butcher wants to estimate the mean weight of a ham. She samples 33 hams and
computes a sample mean weight of 8.2 pounds and a sample standard deviation of 3.3
pounds. What is a 90% confidence interval for the population mean weight of ham?
Please indicate the value you used for z ∗ or t∗ .
12. A professor is interested in the mean length of a letter of recommendation. He samples
51 letters and finds a sample mean length of 620 words with a sample standard deviation
of 90 words. What is a 95% confidence interval for the population mean length of a
letter? Please indicate the value you used for z ∗ or t∗ .
13. A computer professional wants to know the mean number of emails people receive each
day. She is going to compute a 95% confidence interval and wants a margin of error of
±2 emails. She believes the standard deviation to be 18 emails. How large should the
sample size be to ensure this margin of error?
14. A researcher believes that the mean age at which a person first votes is greater than 22
years. He samples 27 people and computes a sample mean of 24.3 years and a sample
standard deviation of 8 years.
(a) State the hypotheses to be tested.
(b) What is the value of your test statistic (t or z value)?
(c) What is the P-value?
(d) What conclusion should be drawn (compare p-value to 0.05).
15. A researcher believes that the mean age at which a person first tries chocolate is less
than 3 years. He samples 24 people and computes a sample mean of 2.3 years and a
sample standard deviation of 1.5 years.
(a) State the hypotheses to be tested.
(b) What is the value of your test statistic (t or z value)?
(c) What is the P-value?
(d) What conclusion should be drawn (compare p-value to 0.05).
16. A researcher believes that the mean height of a prairie dog is different than 14 inches.
She samples 31 prairie dogs and computes a sample mean of 15.8 inches and a sample
standard deviation of 3.6 inches.
(a) State the hypotheses to be tested.
(b) What is the value of your test statistic (t or z value)?
(c) What is the P-value?
(d) What conclusion should be drawn (compare p-value to 0.05).

17. Which of the following are true? If false, explain briefly.

(a) A very low P-value provides evidence against the null hypothesis.
(b) A high P-value is strong evidence in favor of the null hypothesis.
(c) A P-value above 0.10 shows that the null hypothesis is true.
(d) If the null hypothesis is true, you can’t get a p-value below 0.01.

18. Which of the following statements are true? If false, explain briefly.

(a) Using an alpha level of 0.05, a p-value of 0.04 results in rejecting the null hypoth-
esis.
(b) The alpha level depends on the sample size.
(c) With an alpha level of 0.01, a p-value of 0.10 results in rejecting the null hypoth-
esis.
(d) Using an alpha level of 0.05, a p-value of 0.06 means the null hypothesis is true.

19. For each of the following situations, state whether a Type I or Type II, or neither error
has been made. Explain briefly.

(a) A bank wants to know if the enrollment on their website is above 30% based on
a small sample of customers. they test H0 : p = 0.3 versus HA : p > 0.3 and
reject the null hypothesis. Later they find out that actually 28% of all customers
enrolled.
(b) A student tests 100 students to determine whether other students on her campus
prefer Coke or Pepsi and finds no evidence that preference for Coke is not 0.5.
Later, a marketing company tests all students on campus and finds no difference.
(c) A human resource analyst wants to know if the applicants this year score, on
average, higher on their placement exam than the 52.5 points the candidates
averaged last year. She samples 50 recent tests and finds the average to be 54.1
points. She fails to reject the null hypothesis that the mean is 52.5 points. At
the end of the year, they find that the candidates this year had a mean of 55.3
points.
(d) A pharmaceutical company tests whether a drug lifts the headache relief rate
from the 25% achieved by the placebo. They fail to reject the null hypothesis
because the p-value is 0.465. Further testing shows that the drug actually relieves
headaches in 38% of people.
20. A researcher samples 600 children and 500 of them like ice cream. She also samples
450 adults and 350 of them like ice cream. Construct a 95% confidence interval for the
difference of population proportions of children and adults that like ice cream.

21. A researcher samples 1200 children and 500 of them like to exercise. She also samples
900 adults and 350 of them like to exercise. Construct a 90% confidence interval for
the difference of population proportions of children and adults that like to exercise.

22. A scientist believes that the proportion of North American bees that are hostile is
greater than the proportion of South American bees. She samples 500 North American
bees and 200 are hostile. She samples 600 South American bees and 230 are hostile.

(a) State the hypotheses to be tested.


(b) Compute the sample statistic (z value or t value). You must show work to receive
credit.
(c) Give the P-value or range of P-Values.
(d) What decision should the scientist make at a significance level of 5%?

23. A scientist believes that the proportion of North American bears that are hostile is
greater than the proportion of South American bears. She samples 800 North American
bears and 200 are hostile. She samples 1200 South American bears and 240 are hostile.

(a) State the hypotheses to be tested.


(b) Compute the sample statistic (z value or t value). You must show work to receive
credit.
(c) Give the P-value or range of P-Values.
(d) What decision should the scientist make at a significance level of 5%?
24. A man who moves to a new city sees that there are two routes he could take to work.
A neighbor who has lived there a long time tells him Route A will average 5 minutes
faster than Route B. The man decides to experiment; he wants to find out if the mean
difference between Route A and B is different from 5 minutes. Each day, he flips a coin
to determine which way to go, driving each route 20 days. He finds that Route A takes
an average of 40 minutes, with a standard deviation of 3 minutes, and Route B takes
an average of 43 minutes, with a standard deviation of 2 minutes. Histograms of travel
times for the routes are roughly symmetric and show no outliers. Assume α = 0.05.

(a) Find a 95% confidence interval for the difference in average commuting time for
the two routes. Use df= 33.
(b) State the hypotheses to be tested.
(c) Compute the value of the test score.
(d) Give the P-value or range of P-values.
(e) Do the results seem significant?

25. Researchers randomly assigned participants either a tall, thin “highball” glass or a
short, wide “tumbler,” each of which held 355 ml. Participants were asked to pour 1.5
oz = 44.3 ml of water into their glass. Did the shape of the glass make a difference in
how much liquid they poured? In particular, test to see if they poured less water into
the “highball” glass than the “tumbler”. Assume α = 0.1. Here are the summaries:

Highball Tumbler
n 99 n 99
y 42.2 ml y 60.9 ml
s 16.2 ml s 17.9 ml

(a) Find a 90% confidence interval for the difference in average water held for the two
glasses. Use df = 194.
(b) State the hypotheses to be tested.
(c) Compute the value of the test score. (Assume all conditions are met.)
(d) Give the P-value or range of P-values.
(e) Do the results seem significant?
7.1 Various Chapters
1. We want to estimate the healing rate for a wound. A sample of size 17 is collected
and the sample mean is computed to be 24.3 micrometers per hour, with a sample
standard deviation of s= 8 micrometers per hour. What is a 95% confidence interval
for the population mean?

2. A sample of size n=150 people is collected and the sample proportion of people who are
illiterate is computed to be .20. Compute a 95% confidence interval for the population
proportion of illiterate people.

3. You believe that the proportion of people that like cheese is .80. You are going to
construct a 95% confidence interval and want the margin of error to be plus or minus
.03. What should the sample size be?
4. Teresa knows that appointment times are approximately normally distributed. She
believes the mean wait time is longer than 25 minutes. She conducts a test with α
= 0.05 and the appropriate hypotheses. She selects 25 random appointments and the
sample mean was found to be 25.66 minutes and a sample standard deviation of 10
minutes.

(a) State the hypotheses to be tested.


(b) Compute the value of the test score.
(c) Give the P-value or range of P-values.
(d) Do the results seem significant?

5. You claim that the proportion of people who watch American Idol is greater than .50.
You sample n=200 people and compute a sample proportion of .53. Assume α = 0.05.

(a) State the hypotheses to be tested.


(b) Compute the value of the test score.
(c) Give the P-value or range of P-values.
(d) Do the results seem significant?

6. You want to compare the proportion of gamers amongst women and men. You survey
300 women and 400 men. 175 of the women were gamers and 200 of the men were
gamers. Construct a 95% confidence interval for the difference of proportions.

7. You believe that the proportion of men that are colorblind is greater than the pro-
portion of women that are color blind. You sample 900 men and 90 of them are color
blind. You sample 700 women and 45 of them are colorblind. Assume α = 0.05.

(a) State the hypotheses to be tested.


(b) Compute the value of the test score.
(c) Did you use the pooled proportion in part b.?
(d) Compute the P-value.
(e) Are the results significant?
8 Example Solutions
1. (a) We are 95% confident that, if we were to ask all teens whether they have mis-
represented their age online, between 45.6% and 52.5% of them would say they
have.
(b) If we were to collect many random samples of 799 teens, about 95% of the con-
fidence intervals would contain the true proportion of all teens who admit to
misrepresenting their age online.

2. This problem tells us that p̂ = 16/40, n = 40.

(a) For a 90% confidence interval, z ∗ = 1.645. The 90% confidence interval would
then be
q
p̂(1−p̂)
p̂ ± z ∗ n
q 16 16
16 ( 40 )(1− 40 )
= ± 1.645
40
16
√ 40
= ± 1.645 0.006
40
= (0.2726, 0.5274)
(b) We are 90% confidence that between 27% and 53% of all peanut candy bars did
not contain peanuts.
(c) For a 95% confidence interval, z ∗ = 1.96. The 95% confidence interval would then
be q
p̂(1−p̂)
p̂ ± z ∗ n
q 16 16
16 ( 40 )(1− 40 )
= ± 1.96
40
16
√ 40
= ± 1.96 0.006
40
= (0.2482, 0.5518)
(d) We are 95% confident that between 25% and 55% of all peanut candy bars did
not contain peanuts.

3. (a) False. Higher confidence means a larger margin of error.


Suppose n = 10. Suppose p̂ = 0.5. Start with 90% Confidence (z ∗ = 1.645.)
Calculate MOE. Now change to 95% Confidence (z ∗ = 1.96). Calculate MOE.
Compare the two results.
q
M OE90 = 1.645 0.5(1−0.5)
10
q
0.25
= 1.645 10

= 1.645 0.025
= 0.26, q
0.5(1−0.5)
M OE95 = 1.96 10
q
= 1.96 0.25
√ 10
= 1.96 0.025
= 0.31.
From this, we can see that MOE increases when confidence increases.
(b) False. The margin of error decreases as the square root of the sample size increases.
Halving the margin of error requires a sample four times as large as the original.
Suppose p̂ = 0.5. Suppose 95% Confidence (z ∗ = 1.96). Start with MOE = 0.6.
Then compare with MOE = 0.3.
q q
0.6 = 1.96 0.5(1−0.5)
n
⇒ 0.6
1.96
= 0.25
n
q
⇒ 0.306 = 0.25n
⇒ 0.0937 = 0.25
n
0.25
⇒ n = 0.0937
⇒ n = 2.668,
q q
0.5(1−0.5) 0.3 0.25
0.3 = 1.96 n
⇒ 1.96
= n
q
0.25
⇒ 0.153 = n
0.25
⇒ 0.0234 = n
0.25
⇒ n = 0.0234
⇒ n = 10.68.
So our original n = 2.668 and the new n = 10.68, which is approximately 4 times
the original value of n.
(c) True. Larger samples are less variable, which translates to a smaller margin of
error. We can be more precise at the same level of confidence.
Suppose p̂ = 0.5. Suppose 90% Confidence. Start with n = 2 and compare to
n = 18. q
M OE2 = 1.645 0.5(1−0.5)
2
q
= 1.645 0.25
√ 2
= 1.645 0.125
= 0.582, q
M OE18 = 1.645 0.5(1−0.5)
18
q
0.25
= 1.645 18

= 1.645 0.139
= 0.194.
Our MOE decreased when n increased.
(d) True. Larger samples are less variable, which makes us more confident that a
given confidence interval succeeds in catching the population proportion.
Suppose M OE = 0.4. Suppose p̂ = 0.5. Compare the confidence of n = 5 to
n = 8. q

0.4 = z5∗ 0.5(1−0.5)
5
⇒ 0.4 = z5

0.05
0.4 ∗
⇒ √0.05 = z5
⇒ 1.789q = z5∗ ,

0.4 = z8∗ 0.5(1−0.5)
8
⇒ 0.4 = z8∗ 0.03125
0.4
⇒ √0.03125 = z8∗
⇒ 2.263 = z8∗ .
As the sample sizes increases, z ∗ increases, which means that the confidence level
increases.

4. I sample 600 people and 432 of them like cats. Construct a 95% confidence interval
for the population proportion.
432
p̂ = 600 = 0.72
z ∗ = 1.96
n = 600 q
CI : p̂ ± z ∗ p̂(1−p̂)
qn
⇒ 0.72 ± 1.96 0.72(1−0.72)
600
⇒ (0.684, 0.756)

5. I think the proportion of people that eat candy is around 0.75. I am going to construct
a 90% confidence interval and want the margin of error to be ±0.025. How large should
the sample size be?
p̂ = 0.75
z ∗ = 1.645
M OE = 0.025 q
M OE = z ∗ p̂(1−p̂) n
q
⇒ 0.025 = 1.645 0.75(1−0.75)n
q
0.025 0.1875
⇒ 1.645 = n
0.025 2 0.1875

⇒ 1.645 = n
⇒ n = 0.1875 2
( 0.025
1.645 )
⇒ n = 811.8075
⇒ n ≈ 812
6. Jimmy samples 930 people and 234 took public transportation. Construct a 99%
confidence interval for the population proportion.
p̂ = 234
930
z ∗ = 2.576
n = 930 q
CI : p̂ ± z ∗ p̂(1−p̂)
qn
⇒ 234
930
± 2.576 (234/930)(1−234/930)
930
q
0.188
⇒ 0.252 ± 2.576 930
⇒ (0.215, 0.289)

7. I am going to construct a 95% confidence interval for the proportion of people that
wear eyeglasses and want the margin of error to be ±0.2. I have no idea what to
estimate for the population proportion. How large should the sample size be?
p̂ = 0.5whenwedon′ thaveanyideaforthepopulationproportion
z∗ = 1.96
MOE = 0.2q
MOE = z∗ p̂(1−p̂)
qn
⇒ 0.2 = 1.96 0.5(1−0.5)n
q
0.2
⇒ 1.96 = 0.25 n
0.2 2 0.25

⇒ 1.96 = n
⇒ n = 0.25 0.2 2
( 1.96 )
⇒ n = 24.01
⇒ n ≈ 25
8. A researcher believes that more than 50% of all people voted in the last election. She
samples 800 people and 420 of them voted. Test her claim at a significance level of
0.05 (i.e. compare the P-value to 0.05).

(a) State the hypotheses to be tested.


H0 : p = 0.5
HA : p > 0.5
(b) Compute the test statistics (z-value). You must show your computation to receive
credit.
p̂ = 420/800 = 0.525. n = 800.
z = q pp̂−p 0
0 (1−p0 )
= q0.525−0.5
0.5(1−0.5)
=√
0.025
0.25
= 1.41
n 800 800

(c) Compute the P-value associated with your test statistic.


P (Z > 1.41) = normalcdf (1.41, 999) = 0.0793
(d) Make a conclusion about the hypotheses.
Since the p-value is “large” (0.0793 > 0.05), Do Not Reject H0 . The results are
not significant.

9. A researcher believes that fewer than 75% of all mollusks are tasty. He samples 1200
mollusks and 865 of them are tasty. Test his claim at a significance level of 0.05 (i.e.
compare the P-value to 0.05).

(a) State the hypotheses to be tested.


H0 : p = 0.75
HA : p < 0.75
(b) Compute the test statistics (z-value). You must show your computation to receive
credit.
p̂ = 865/1200 = 0.721. n = 1200.
z = q pp̂−p 0
0 (1−p0 )
= q0.721−0.75
0.75(1−0.75)
−0.029
=√ 0.1875
= −2.32
n 1200 1200

(c) Compute the P-value associated with your test statistic.


P (Z < −2.32) = normalcdf (−999, −2.32) = 0.0102
(d) Make a conclusion about the hypotheses.
Since the p-value is “small” (0.0102 < 0.05), Reject H0 . The results are significant.

10. A researcher believes that the percentage of people that watch Game of Thrones is
different than 27%. He samples 900 people and 220 of them watch. Test his claim at
a significance level of 0.05 (i.e. compare the P-value to 0.05).

(a) State the hypotheses to be tested.


H0 : p = 0.27
HA : p ̸= 0.27
(b) Compute the test statistics (z-value). You must show your computation to receive
credit.
p̂ = 220/900 = 0.244. n = 900.
z = q pp̂−p 0
0 (1−p0 )
= q0.244−0.27
0.27(1−0.27)
−0.026
=√ 0.1971
= −1.76
n 900 900

(c) Compute the P-value associated with your test statistic.


Note that this is a 2-sided test.
p-value= 2P (Z < −1.76)
= 2 (normalcdf (−999, −1.76))
= 2(0.039)
= 0.078
(d) Make a conclusion about the hypotheses.
Since the p-value is “large” (0.078 > 0.05), Do Not Reject H0 . The results are
not significant.

11. A butcher wants to estimate the mean weight of a ham. She samples 33 hams and
computes a sample mean weight of 8.2 pounds and a sample standard deviation of 3.3
pounds. What is a 90% confidence interval for the population mean weight of ham?
Please indicate the value you used for z ∗ or t∗ .
Summary of what is given:
n = 33
y = 8.2
s = 3.3.
For confidence intervals for the mean, we use t∗ , with n − 1 degrees of freedom and
90% confidence (for this case). Thus, t∗32 = 1.694.
CI : y ± t∗n−1 √sn
⇒ 8.2 ± 1.694 √3.333
⇒ (7.227, 9.173)

12. A professor is interested in the mean length of a letter of recommendation. He samples


51 letters and finds a sample mean length of 620 words with a sample standard deviation
of 90 words. What is a 95% confidence interval for the population mean length of a
letter? Please indicate the value you used for z ∗ or t∗ .
Summary of what is given:
n = 51
y = 620
s = 90.
For confidence intervals for the mean, we use t∗ , with n − 1 degrees of freedom and
95% confidence (for this case). Thus, t∗50 = 2.009.
CI : y ± t∗n−1 √sn
⇒ 620 ± 2.009 √9051
⇒ (594.682, 645.318)
13. A computer professional wants to know the mean number of emails people receive each
day. She is going to compute a 95% confidence interval and wants a margin of error of
±2 emails. She believes the standard deviation to be 18 emails. How large should the
sample size be to ensure this margin of error? Summary of what is given:
M OE = 2
s = 18
For sample size calculation, since this is based on the mean, use t∗ , with n − 1 degrees
of freedom. Note that as n becomes really large, the t-distribution becomes more like
the normal distribution. Therefore, use the 95% confidence interval critical value from
the normal distribution instead. z ∗ = 1.96. Sample size can be calculated as follows:
M OE = t∗n−1 √sn ⇒ 2 = 1.96 √18n
2
⇒ 1.96×18 = √1n

⇒ n = 1.96×182 
2
⇒ n = 1.96×18
2
⇒ n = 311.1696
⇒ n ≈ 312

14. A researcher believes that the mean age at which a person first votes is greater than 22
years. He samples 27 people and computes a sample mean of 24.3 years and a sample
standard deviation of 8 years.

(a) State the hypotheses to be tested.


H0 : µ = 22
HA : µ > 22
(b) What is the value of your test statistic (t or z value)?
Use the t-test statistic because we are dealing with means.
tn−1 = y−µ
√s
0
n
24.3−22 2.3
t27−1 = t26 = √8
= 1.54
= 1.49
27

(c) What is the P-value?


On your calculator: tcdf (1.49, 999, 26) = 0.0741
On the table: Go to degrees of freedom 26, find where 1.49 is in the row, and then
look at the one-tail probability values. The probability is between 0.05 and 0.10.
(d) What conclusion should be drawn (compare p-value to 0.05).
Since the p-value is “large” (0.0741 > 0.05), Do Not Reject H0 . The results are
not significant.
15. A researcher believes that the mean age at which a person first tries chocolate is less
than 3 years. He samples 24 people and computes a sample mean of 2.3 years and a
sample standard deviation of 1.5 years.

(a) State the hypotheses to be tested.


H0 : µ = 3
HA : µ < 3
(b) What is the value of your test statistic (t or z value)?
Use the t-test statistic because we are dealing with means.
tn−1 = y−µ
√s
0
n
2.3−3 −0.7
t24−1 = t23 = 1.5

= 0.3062
= −2.286
24

(c) What is the P-value?


On your calculator: tcdf (−999, −2.286, 23) = 0.0159.
On the table: Go to degrees of freedom 23, find where 2.286 is in the row, and
then look at the one-tail probability values. The probability is between 0.01 and
0.025.
(d) What conclusion should be drawn (compare p-value to 0.05).
Since the p-value is “small” (0.0159 < 0.05), Reject H0 . The results are significant.

16. A researcher believes that the mean height of a prairie dog is different than 14 inches.
She samples 31 prairie dogs and computes a sample mean of 15.8 inches and a sample
standard deviation of 3.6 inches.

(a) State the hypotheses to be tested.


H0 : µ = 14
HA : µ ̸= 14
(b) What is the value of your test statistic (t or z value)? Use the t-test statistic
because we are dealing with means.
tn−1 = y−µ
√s
0
n
15.8−14 1.8
t31−1 = t30 = 3.6

= 0.6466
= 2.784
31

(c) What is the P-value?


On your calculator: 2tcdf (2.784, 999, 30) = 2(0.0046) = 0.0092.
On the table: go to degrees of freedom 30, find where 2.784 is in the row, and
then look at the two-tail probability values. The probability is lower than 0.01.
(d) What conclusion should be drawn (compare p-value to 0.05).
Since the p-value is “small” (0.0092 < 0.05), Reject H0 . The results are significant.

17. Which of the following are true? If false, explain briefly.

(a) A very low P-value provides evidence against the null hypothesis.
True.
(b) A high P-value is strong evidence in favor of the null hypothesis.
False. A high p-value shows that the data are consistent with the null hypothesis
but does not prove that the null hypothesis is true.
(c) A P-value above 0.10 shows that the null hypothesis is true.
False. No p-value ever shows that the null hypothesis is true (or false).
(d) If the null hypothesis is true, you can’t get a p-value below 0.01.
False. If the null hypothesis is true, you will get a p-value below 0.01 about once
in a hundred hypothesis tests.
18. Which of the following statements are true? If false, explain briefly.
(a) Using an alpha level of 0.05, a p-value of 0.04 results in rejecting the null hypoth-
esis.
True.
(b) The alpha level depends on the sample size.
False. The alpha level is set independently and does not depend on the sample
size.
(c) With an alpha level of 0.01, a p-value of 0.10 results in rejecting the null hypoth-
esis.
False. The p-value would have to be less than 0.01 to reject the null hypothesis.
(d) Using an alpha level of 0.05, a p-value of 0.06 means the null hypothesis is true.
False. It means that we do not have enough evidence at that alpha level to reject
the null hypothesis.
19. For each of the following situations, state whether a Type I or Type II, or neither error
has been made. Explain briefly.
(a) A bank wants to know if the enrollment on their website is above 30% based on
a small sample of customers. they test H0 : p = 0.3 versus HA : p > 0.3 and
reject the null hypothesis. Later they find out that actually 28% of all customers
enrolled.
Type I Error. The actual value is not greater than 0.3, but they rejected the null
hypothesis.
(b) A student tests 100 students to determine whether other students on her campus
prefer Coke or Pepsi and finds no evidence that preference for Coke is not 0.5.
Later, a marketing company tests all students on campus and finds no difference.
No error. The actual value is 0.5 which was not rejected.
(c) A human resource analyst wants to know if the applicants this year score, on
average, higher on their placement exam than the 52.5 points the candidates
averaged last year. She samples 50 recent tests and finds the average to be 54.1
points. She fails to reject the null hypothesis that the mean is 52.5 points. At
the end of the year, they find that the candidates this year had a mean of 55.3
points.
Type II Error. The actual value was 55.3 points, which is greater than 52.5, which
was not rejected.
(d) A pharmaceutical company tests whether a drug lifts the headache relief rate
from the 25% achieved by the placebo. They fail to reject the null hypothesis
because the p-value is 0.465. Further testing shows that the drug actually relieves
headaches in 38% of people.
Type II Error. The null hypothesis was not rejected, but it was false. The true
relief rate was greater than 0.25.

20. A researcher samples 600 children and 500 of them like ice cream. She also samples
450 adults and 350 of them like ice cream. Construct a 95% confidence interval for the
difference of population proportions of children and adults that like ice cream.
What we are given:
500
p̂1 = 600
350
p̂2 = 450

Since we are considering the confidence interval for the difference of proportions, we
need a value for z ∗ . Here, z ∗ = 1.96. The confidence interval is
q
p̂1 (1−p̂1 )
CI: (p̂1 − p̂2 ) ± z ∗
n1
+ p̂2 (1−p̂
n2
2)

q 500 500 350


500 350
 (1− ) (1− 350 )
⇒ 600 − 450 ± 1.96 600 600600 + 450 450450
q
⇒ 181
± 1.96 5/36600
+ 14/81
450
⇒ (0.0069, 0.1042)
21. A researcher samples 1200 children and 500 of them like to exercise. She also samples
900 adults and 350 of them like to exercise. Construct a 90% confidence interval for
the difference of population proportions of children and adults that like to exercise.
What we are given:
500
p̂1 = 1200
350
p̂2 = 900
Since we are considering the confidence interval for the difference of proportions, we
need a value for z ∗ . Here, z ∗ = 1.645. The confidence interval is
q
CI:(p̂1 − p̂2 ) ± z ∗ p̂1 (1−p̂
n1
1)
+ p̂2 (1−p̂
n2
2)

q 500 500 350


500 1200 (
1− 1200 ) 900 (
1− 350
900 )
− 350

⇒ 1200 900q
± 1.645 1200
+ 900

⇒ 361
± 1.645 35/1441200
+ 77/324
900
⇒ (−0.0078, 0.0633)

22. A scientist believes that the proportion of North American bees that are hostile is
greater than the proportion of South American bees. She samples 500 North American
bees and 200 are hostile. She samples 600 South American bees and 230 are hostile.

(a) State the hypotheses to be tested.


H0 : pN A − pSA = 0
HA : pN A − pSA > 0
(b) Compute the sample statistic (z value or t value). You must show work to receive
credit.
This is the hypothesis test for the difference of proportions.
p̂pooled = #SuccessGrp1+#SuccessGrp2
n1 +n2
200+230
= 500+600
43
= 110
q
p̂pooled (1−p̂pooled ) p̂ (1−p̂ )
SEpooled (p̂N A − p̂SA ) = n1
+ pooled n2 pooled
q 43
(1− 43 ) 43
(1− 43 )
= 110 500110 + 110 600110
q
= 0.2381500
+ 0.2381
600
= 0.0295
p̂N A −p̂SA
z = SEpooled (p̂N A −p̂SA )
200
− 230
= 500 600
0.0295
1/60
= 0.0295
= 0.565
(c) Give the P-value or range of P-Values.
On your calculator: normalcdf (0.565, 999) = 0.2860.
(d) What decision should the scientist make at a significance level of 5%?
Since the p-value is “large” (0.2860 > 0.05), Do Not Reject H0 . The results are
not significant.
23. A scientist believes that the proportion of North American bears that are hostile is
greater than the proportion of South American bears. She samples 800 North American
bears and 200 are hostile. She samples 1200 South American bears and 240 are hostile.

(a) State the hypotheses to be tested.


H0 : pN A − pSA = 0
HA : pN A − pSA > 0
(b) Compute the sample statistic (z value or t value). You must show work to receive
credit.
This is the hypothesis test for the difference of proportions.
p̂pooled = #SuccessGrp1+#SuccessGrp2
n1 +n2
200+240
= 800+1200
= 0.22 q
p̂pooled (1−p̂pooled ) p̂ (1−p̂ )
SEpooled (p̂N A − p̂SA ) = n1
+ pooled n2 pooled
q
= 0.22(1−0.22)
800
+ 0.22(1−0.22)
1200
q
0.1716 0.1716
= 800
+ 1200
= 0.0189 200 240
p̂N A −p̂SA ( 800 − 1200 ) 0.05
z = SEpooled (p̂N A −p̂SA )
= 0.0189
= 0.0189 = 2.646
(c) Give the P-value or range of P-Values.
On your calculator: normalcdf (2.646, 999) = 0.0041.
(d) What decision should the scientist make at a significance level of 5%?
Since the p-value is “small” (0.0041 < 0.05), Reject H0 . The results are significant.

24. A man who moves to a new city sees that there are two routes he could take to work.
A neighbor who has lived there a long time tells him Route A will average 5 minutes
faster than Route B. The man decides to experiment; he wants to find out if the mean
difference between Route A and B is different from 5 minutes. Each day, he flips a coin
to determine which way to go, driving each route 20 days. He finds that Route A takes
an average of 40 minutes, with a standard deviation of 3 minutes, and Route B takes
an average of 43 minutes, with a standard deviation of 2 minutes. Histograms of travel
times for the routes are roughly symmetric and show no outliers. Assume α = 0.05.

(a) Find a 95% confidence interval for the difference in average commuting time for
the two routes. Use df= 33.
Since df = 33, then t∗33 = 2.0345.
q 2
∗ s s2
CI:(y B − y A ) ± tdf nBB + nAA
q
22 32
⇒ (43 − 40) ± 2.0345 20 + 20

⇒ 3 ± 2.0345 0.65
⇒ (1.36, 4.64)
Note that this result means that we are 95% confident that Route B has a mean
commuting time between 1.36 and 4.64 minutes more than the mean commuting
time of Route A. Also, because 5 minutes is not within the interval, it appears
that the neighbor may be exaggerating the average difference in commuting time.
(b) State the hypotheses to be tested.
H0 : µB − µA = 5
HA : µB − µA ̸= 5
(c) Compute the value of the test score.
(y B −y A )−∆0 (43−40)−5
t33 = r = q = √−2 = −2.481
s2 s2 22 2
+ 320 0.65
B + nA 20
nB A

(d) Give the P-value or range of P-values.


On your calculator: 2tcdf (−999, −2.481, 33) = 2(0.00919) = 0.0184.
On the table: go to degrees of freedom 33, find where 2.481 is in the row, and
then look at the two-tail probability values. The p-value is between 0.01 and 0.02.
(e) Do the results seem significant?
Since the p-value is “small” (0.0184 < α = 0.05), Reject H0 . The results are
significant. There is evidence to conclude that the average difference in commuting
time is different from 5 minutes. However, we don’t know if it is higher than 5
minutes or lower than 5 minutes because we did not test for that.

25. Researchers randomly assigned participants either a tall, thin “highball” glass or a
short, wide “tumbler,” each of which held 355 ml. Participants were asked to pour 1.5
oz = 44.3 ml of water into their glass. Did the shape of the glass make a difference in
how much liquid they poured? In particular, test to see if they poured less water into
the “highball” glass than the “tumbler”. Assume α = 0.1. Here are the summaries:

Highball Tumbler
n 99 n 99
y 42.2 ml y 60.9 ml
s 16.2 ml s 17.9 ml

(a) Find a 90% confidence interval for the difference in average water held for the two
glasses. Use df = 194.
Because we are looking for a 90% confidence interval with df = 194, t∗194 = 1.6528.
q 2
∗ s s2
CI: (y H − y T ) ± tdf nHH + nTT
q
2 2
⇒ (42.2 − 60.9) ± 1.6528 16.2 99
+ 17.9
99

⇒ −18.7 ± 1.6528 5.8874
⇒ −18.7 ± 1.6528(2.4264)
⇒ (−22.71, −14.69)
(b) State the hypotheses to be tested.
H0 : µH − µT = 0
HA : µH − µT < 0
(c) Compute the value of the test score. (Assume all conditions are met.)
(y H −y T )−0
t194 = r
s2 s2
H + nT
nH T
= q 42.2−60.9
16.22 2
99
+ 17.9
99
−18.7
= 2.4264
= −7.707
(d) Give the P-value or range of P-values.
On your calculator: tcdf (−999, −7.707, 194) = 0.
On the table (or an online table): go to degrees of freedom 194, find where 7.707
is in the row, and then look at the one-tail probability values. Compare 7.707 to
the values on the table. The p-value is less than 0.001.
(e) Do the results seem significant?
Since the p-value is “small” (0 < α = 0.1), Reject H0 . The results are signifi-
cant. There is sufficient evidence to conclude that they poured less water into the
“highball” glass than the “tumbler”.

8.1 Various Chapters


1. We want to estimate the healing rate for a wound. A sample of size 17 is collected
and the sample mean is computed to be 24.3 micrometers per hour, with a sample
standard deviation of s= 8 micrometers per hour. What is a 95% confidence interval
for the population mean?
What we are given:
y = 24.3
s=8
n = 17
Because we want the confidence interval for the population mean, use the formula
CI : y ± t∗n−1 √sn
⇒ 24.3 ± t∗17−1 √817
⇒ 24.3 ± 2.120 √817
⇒ (20.187, 28.413)

2. A sample of size n=150 people is collected and the sample proportion of people who are
illiterate is computed to be .20. Compute a 95% confidence interval for the population
proportion of illiterate people.
What we are given:
n = 150
p̂ = 0.2
Because we want the confidence interval for the population proportion, use the formula
q
∗ p̂(1−p̂)
CI: p̂ ± z
qn
⇒ 0.2 ± 1.96 0.2(1−0.2) 150
q
⇒ 0.2 ± 1.96 0.16
150
⇒ (0.136, 0.264)
3. You believe that the proportion of people that like cheese is .80. You are going to
construct a 95% confidence interval and want the margin of error to be plus or minus
.03. What should the sample size be?
What we are given:
p̂ = 0.8
M OE = 0.03
Since this is dealing with one proportion, use the formula
q q
p̂(1−p̂)
M OE = z ∗
n
⇒ 0.03 = 1.96 0.8(1−0.8)
n
q
0.03 0.16
⇒ 1.96 = n
0.03 2 0.16

⇒ 1.96 = n
0.16
⇒ n = 0.03 2
( 1.96 )
⇒ n = 682.95
⇒ n ≈ 683

4. Teresa knows that appointment times are approximately normally distributed. She
believes the mean wait time is longer than 25 minutes. She conducts a test with α
= 0.05 and the appropriate hypotheses. She selects 25 random appointments and the
sample mean was found to be 25.66 minutes and a sample standard deviation of 10
minutes.

(a) State the hypotheses to be tested.


H0 : µ = 25
HA : µ > 25
(b) Compute the value of the test score.
Since this is asking for us to test the population mean, we need the formula
tn−1 = y−µ
√s
0
n
25.66−25 0.66
⇒ t25−1 = t24 = 10

= 2
= 0.33
25

(c) Give the P-value or range of P-values.


On your calculator: tcdf (0.33, 999, 24) = 0.372.
On the table: go to degrees of freedom 24, find where 0.33 is on the row, and then
look at the one-tail probability values. The probability is greater than 0.10.
(d) Do the results seem significant?
Since the p-value is “large” (0.372 > α = 0.05), Do Not Reject H0 . The results
are not significant.

5. You claim that the proportion of people who watch American Idol is greater than .50.
You sample n=200 people and compute a sample proportion of .53. Assume α = 0.05.

(a) State the hypotheses to be tested.


H0 : p = 0.5
HA : p > 0.5
(b) Compute the value of the test score. Since this is asking for us to test the popu-
lation proportion, we need the formula
z = q pp̂−p 0
0 (1−p0 )
= q0.53−0.5
0.5(1−0.5)
= √0.03
0.25
0.03
= √0.00125 = 0.849
n 200 200

(c) Give the P-value or range of P-values.


On your calculator: normalcdf (0.849, 999) = 0.198.
(d) Do the results seem significant?
Since the p-value is “large” (0.198 > α = 0.05), Do Not Reject H0 . The results
are not significant.

6. You want to compare the proportion of gamers amongst women and men. You survey
300 women and 400 men. 175 of the women were gamers and 200 of the men were
gamers. Construct a 95% confidence interval for the difference of proportions.
What we are given:
175
p̂W = 300
200
p̂M = 400
Because we want the confidence interval for the difference of proportions, use the
formula
q
p̂1 (1−p̂1 )
CI: (p̂1 − p̂2 ) ± z ∗
+ p̂2 (1−p̂ 2)

qn1 175 n2
( 300 )(1− 175
300 ) ( 200 )(1− 200 )
⇒ 175 − 200

300 400q
± 1.96 300
+ 400 400 400
35
1
⇒ 12 ± 1.96 300 144
+ 0.25
400
⇒ (0.0091, 0.157)

7. You believe that the proportion of men that are colorblind is greater than the pro-
portion of women that are color blind. You sample 900 men and 90 of them are color
blind. You sample 700 women and 45 of them are colorblind. Assume α = 0.05.

(a) State the hypotheses to be tested.


H0 : pM − pW = 0
HA : pM − pW > 0
(b) Compute the value of the test score.
p̂pooled = NumberofSuccessesinGroup1+NumberofSuccessesinGroup2
n1 +n2
90+45
= 900+700
= 0.084375 q
p̂pooled (1−p̂pooled ) p̂ (1−p̂ )
SEpooled (p̂1 − p̂2 ) = n1
+ pooled n2 pooled
q
= 0.084375(1−0.084375)
900
+ 0.084375(1−0.084375)
700

−5
= √8.534 × 10 + 1.104 × 10 −4

= 1.9574 × 10−4
= 0.014 90 45
p̂1 −p̂2 − 700 1/28
z = SEpooled (p̂1 −p̂2 )
= 900
0.014
= 0.014 = 2.55
(c) Did you use the pooled proportion in part b.?
YES
(d) Compute the P-value.
On your calculator: normalcdf (2.55, 999) = 0.0054
(e) Are the results significant?
Since the p-value is “small” (0.0054 < α = 0.05), Reject H0 . The results are
significant.

You might also like