Professional Documents
Culture Documents
REquisition and Studies
REquisition and Studies
1
Important Formulas and Concepts
1 Chapter 1
1.1 Definitions
1. Data
Any collection of numbers, characters, images, or other items that provide information
about something.
2. Categorical/Qualitative Variables
Name categories for grouping.
3. Quantitative Variables
When a variable contains measured numerical values with measurement units.
4. Ordinal Variable
A categorical variable with an ordering.
5. Identifier Variable
Each record has a unique value like Student ID or SSN.
6. Frequency Table
Records to totals and uses the category names to label each row.
8. Bar Chart
Displays the distribution of a categorical variable, showing counts for each category
next to each other for easy comparison.
11. Distribution
Slices up all the possible values of the variable into equal width bins and gives the
number of values (or counts) falling into each bin.
1
This version: September 15, 2021, by Dale Embers. May not include all things that could possibly be
tested on. To be used as an additional reference to studying all Chapters 1 - 5.
12. Histogram
Uses adjacent bars to show the distribution of a quantitative variable. Each bar shows
the frequency of values falling into each bin.
13. Unimodal
Histogram with one peak.
14. Bi-modal
Histogram with two peaks.
15. Uniform
Histogram that doesn’t appear to have any mode. Bars are approximately the same
height for each bin.
16. Symmetric
Histogram in which the two halves on either side of the center look approximately like
mirror images.
17. Skew
Histogram that is not symmetric.
2 Chapter 2
2.1 Definitions
1. 5 Number summary- Min Q1 Median Q3 Max
2. Boxplot: Displays the 5 number summary as a central box with whiskers that extend
to the nonoutlier data values.
3. Use sample mean and sample standard deviation when the data is symmetric and
has no significiant outlier. Use the median and IQR when the data is skewed or has
signficant outliers.
4. Outliers are values that are either above the upper fence or below the lower fence.
2.2 Formulas
1. Median = Once the data is ordered from smallest to largest, it is the middle value in
the data. Divides the histogram into 2 equal pieces.
P
x
2. Mean = Average of all of the values = x̄ =
n
3. Range = Max - Min
6. IQR = Q3 − Q1
(x − x̄)2
P
2
7. Variance: s =
n−1
√
8. Standard
v Deviation: s = s2
u (x − x̄)2
uP
=t
n−1
9. Upper Fence for Boxplot = Q3 + 1.5IQR
3 Chapter 3
3.1 Definitions
1. Scatterplot
Shows the relationship between 2 quantitative variables.
2. Direction of Scatterplot
Positive direction means as one variable increases so does the other. Decreasing direc-
tion means the association is negative.
3. Form of Scatterplot
Is it in a straight line or some other form?
4. Strength of Scatterplot
Strong association if there is little scatter around the underlying relationship.
5. Outlier
A point that does not fit the overall pattern seen in the scatterplot.
3.2 Formulas
1. Correlation Coefficient
Ph x−x̄ y−ȳ i
( sx ) sy
r= (n−1)
3.3 Properties of Correlation Coefficient
1. Measures the strength and direction of the linear association between two quantitative
variables.
8. No units
9. Affected by outlers
10. Not affected by changes in scale or if the variables are standardized (changing to z-
score)
11. Does NOT prove causation. Only provides a relationship between 2 variables.
4 Chapter 4
4.1 Definitions
1. Linear Model
Equation of the form ŷ = a + bx. ŷ means estimated values for y.
2. Predicted Values
Value of ŷ found for a given x-value in the data.
4. Residual
Differences between data values and the corresponding values predicted by the model
(observed - expected)
5. R2
Gives the fraction of variability of y accounted for by the least squares linear regression
on x. It is an overall measure of how successful the regression is in linearly relating y
to x.
6. Least Squares Criterion
Specifies the unique line that minimizes the variance of the residuals or the sum of
squared residuals.
7. Extrapolation
In any regression situation it is unsafe. Predictions from extrapolation should not be
trusted.
8. Influential Point
A point that ,if omitted from the data, results in a very different regression model.
4.2 Formulas
1. Residual = Observed value - Predicted value = y − ŷ
2. b = r (sy /sx )
3. a = ȳ − bx̄
5 Chapter 5
5.1 Definitions
1. Contingency Table
Table which shows how the individuals are distributed along each variable.
2. Marginal Distribution
Row total or column total in contingency tables.
3. Conditional Distribution
Show distribution of one variable for just those cases that satisfy a condition on another
variable. Example: Event B given Event A occurs first.
6 Extra Information
Review any and all notes and supplementary materials. It may be the case that something
was accidentally omitted from this study guide. Also, review any problems that may have
been discussed in class as not all example problems may have been provided here.
7 Example Problems
1. Use the below table to answer the following questions.
Eye Color
Blue Green Brown Total
Male 5 7 15 27
Gender
Female 6 2 10 18
Total 11 9 25 45
(a) Construct a frequency table for Eye Color based on the data above.
(b) Find the marginal distribution of gender.
(c) What percentage of females have blue eyes?
(d) What percentage of green eyed people are male?
(e) What percentage of people are females and have green eyes?
(f) What percentage of blue eyed people are female?
(g) What percentage of males have brown eyes?
(h) What percentage of people have brown eyes?
(i) What percentage of people are males and have blue eyes?
3. In a histogram that is skewed to the right, which is larger, the mean or the median?
4. In a histogram that is skewed or has outliers, which should be reported, the mean or
the median?
5. In a histogram that is skewed or has outliers, which should be reported, the IQR or
the Standard Deviation?
(a) Mean
(b) Median and Quartiles
(c) Range and IQR
(d) What are the values of the upper fence and the lower fence?
(e) Are there any outliers in this data? Why?
(f) Find the variance and standard deviation.
8. Shown below are the histogram and summary statistics for the number of camp sites
at public parks in Vermont.
(a) Which statistics would you use to identify the center and spread of this distribu-
tion? Why?
(b) How many parks would you classify as outliers? Explain.
(c) Create a boxplot for this data.
9. Use the given data to answer the following questions.
(a) Draw a scatterplot of the above data to study the association between the energy
used and the price it cost.
(b) What is the direction of the association?
(c) What is the form of the relationship?
(d) What is the strength of the relationship?
(e) Are there any outliers?
(f) What is the correlation coefficient?
10. For the below residual plots, decide if a linear model is appropriate. In the case that the
linear model is not appropriate, decide which condition is violated (linearity, outlier,
or equal spread).
Figure 1: Residual Plots
(a) Plot 1 (b) Plot 2 (c) Plot 4
30
25
20
Price ($)
15
10
0
0 5 10 15 20 25
Energy Used (KWH)
1 Chapter 6
1.1 Definitions
1. Population
The entire group of individuals or instances about whom we hope to learn.
2. Sample
A (representative) subset of a population, examined in the hope of learning about the
population.
3. Sample Survey
A study that asks questions of a sample drawn from some population in the hope of
learning something about the entire population.
4. Randomization
The best defense against bias is randomization, in which each individual is given a fair,
random chance of selection.
5. Population Parameter
A numerically valued attribute of a model for a population. Example: mean income
of all employed people in the USA
6. Sample statistic
Statistics or sample statistics are values that are calculated for sample data. Example:
mean income of employed people in a representative sample
7. Sampling Frame
A list of individuals from whom the sample is drawn. Individuals who may be in the
population of interest, but who are not in the sampling frame cannot be included in
any sample.
17. Studies
2 Chapter 7
1. Experiments
(a) Factor
Variable whose levels are manipulated by the experimenter.
(b) Response Variable
Variable whose values are compared across different treatments.
(c) Experiment
Manipulates factor levels to create treatments, randomly assigns subjects to these
treatment levels, and then compares the responses of the subject groups across
treatment levels. Tries to assess effects of treatments.
(d) Levels
Specific values that the experimenter chooses for a factor.
(e) Treatment
Process, intervention, or other controlled circumstance applied to randomly as-
signed experimental units.
(f) Block
When groups of experimental units are similar in a way that is not a factor
under study, it is often a good idea to gather them together into blocks and then
randomize the assignment of treatments within each block.
(a) Control
Control aspects of the experiment that we know may have an effect on the re-
sponse, but that are not the factors being studied.
(b) Randomize
Randomize subjects to treatments to even out effects that we cannot control.
(c) Replicate
Replicate over as many subjects as possible.
(d) Block
Reduce the effects of identifiable attributes of the subjects that cannot be con-
trolled.
4. Statistically Significant
When an observed difference is too large for us to believe that it is likely to have
occurred naturally, we consider the difference to be statistically significant.
5. Types of Experiments
6. Control Treatment
Baseline treatment.
7. Control Group
Experimental units assigned to a baseline treatment level typically either the default
treatment or a placebo treatment. Responses provide a basis for comparison.
8. Blinding
Any individual associated with an experiment who is not aware of how subjects have
been allocated to treatment groups.
9. Single/Double Blind
Those who could influence the results.
Those who evaluate the results.
Single Blind: when either of the two above statements is blinded. Double Blind: when
both of the two above statements is blinded.
10. Placebo
A treatment known to have no effect.
11. Placebo Effect
The tendency of human subjects to show a response even when administered a placebo.
12. Potential Problems
(a) Confounding
When the levels of one factor are associated with the levels of another factor in
such a way that their effects cannot be separated, we say that these two factors
are confounded.
(b) Lurking Variable
A variable associated with both y and x that makes it appear that x may be
causing y.
13. In summary, the best experiments are usually 1) Randomized, 2) Comparative, 3)
Double-blind, and 4) Placebo-controlled.
3 Chapter 9 and 10
3.1 Definitions
1. Random Phenomenon
A phenomenon is random if we know what outcomes could happen, but not which
particular values will happen.
2. Trial
A single attempt or realization of a random phenomenon.
3. Outcome
The value measured, observed, or reported for an individual instance of a trial.
4. Event
A collection of outcomes. Usually, we identify events so that we can attach probabilities
to them. Denote events with bold capital letters like A, B, etc.
5. Sample Space
The collection of all possible outcome values. The collection of values in the sample
space has a probability of 1. Denote by S or Ω.
8. Probability
A number between 0 and 1 that reports the likelihood of that event’s occurrence. Write
P(A) for the probability of event A.
9. Empirical Probability
When the probability comes from the long-run relative frequency of the event’s occur-
rence.
P(S) = 1
The set of all possible outcomes of a trial must have probability = 1.
3. Complement Rule
Set of outcomes that are not in the event A is the complement AC
P (AC ) = 1 − P (A) Where AC is the complement of A,
The probability of an event not occurring is 1 minus the probability that it occurs
4. Addition Rule
For 2 mutually exclusive events A and B, the probability that one or the other
occurs is the sum of the probability of the two events.
P (A or B) = P (A) + P (B) where A and B are mutually exclusive.
disjoint also means mutually exclusive; there are no outcomes in common
5. Multiplication Rule
For two independent events A and B, the probability that both A and B occur
is the product of the probabilities of the two events.
P (A and B) = P (A)P (B) where A and B are independent
7. Conditional Probability
The conditional probability of the event B given the event A has occurred is
P (B | A) = P (A and B) .
P (A)
9. Independent
Events A and B are independent when P (B | A) = P (B). Note: independent is not
the same as disjoint.
4 Chapter 11
4.1 Definitions
1. z-score
Tells how many standard deviations a value is from the mean. Regardless of direction,
the farther a data vlaue is from the mean the more unusual it is.
2. Standard Normal Model
A Normal Model N(µ, σ) with mean 0 and standard deviation 1.
3. 68-95-99.7 Rule
In a normal model, about 68% of values fall within 1 standard deviation of the mean,
about 95% fall within 2 standard deviations of the mean; about 99.7% of values fall
within 3 standard deviations of the mean.
4. Normal Percentile
The normal percentile corresponding to a z-score gives the percentage of values in a
standard normal distribution found at that z-score or below. Compared to area under
the curve. See normal table in the textbook.
4.2 Formulas
1. z-score:
x−µ
z=
σ
4. When a normal curve is split in half from the mean, each side contains 50% of the area
6. If a normal curve is split with 30% of the area on one side, the other side of the curve
is 70% of the area
7. If a normal curve has 60% of the area in the middle, the remaining portions are a total
of 40%. This 40% is allocated half to each side. So the far left has 20% of the area,
the middle is 60% of the area, and the far right side has 20% of the area.
Textbook Normal Table Note: These tables give the percentage to the left of the z value.
5 Example Problems
1. A healthcare system wants to determine if its patients are being treated with a sufficient
level of care, so they consider a number of sampling methods. Identify each of the
following sample types as Simple Random Sample, Stratified Sample, Cluster Sample,
Multistage Sample, or Convenience Sample.
(a) They randomly select 15 patients from each of the hospitals in the system and
survey them.
(b) They randomly select 10 physicians and survey every patient belonging to that
physician.
(c) They make a list of all the patients in the system and randomly select 200.
2. I decide that I want to know students’ opinions on a variety of issues. Decide which of
the following best describes the issue as Voluntary Response Bias, Nonresponse Bias,
Response Bias, or Undercoverage (choose one for each situation).
(a) I survey students (not anonymously) about whether or not they use illegal drugs.
(b) I ask students to fill out an online survey regarding the use of Blackboard in the
classroom.
(c) I randomly select 300 students from a list of those receiving Pell grants and survey
them regarding financial aid.
3. To test the effect of a medication, 100 volunteers were randomly divided into two
groups. Each person was given a month’s supply of pills. For one group, the pill
contained the medicine, whereas for the other group, the pills contained only inert
ingredients. Participants were not told which type of pill they had. At the end of the
month, a researcher evaluated them to determine if they had improved. The researcher
did not know which of the subjects had the pill with the medicine added. Identify which
of these statements is true.
(a) The group receiving the pill with inert ingredients will not experience the placebo
effect.
(b) This experiment includes blocking.
(c) The number of factors in the experiment is two.
(d) This study is single blind
(e) This study is single blind
(f) The group receiving the medicine is the control group.
4. A veterinarian is studying the effect a diet high in alfalfa may be a cause in horse.
The veterinarian decides to use the following design: identify 30 horses, and divide
them into 3 groups of 10 horses. One group consists of horses in barn stalls, one group
consist of horses in outdoor paddocks, and one group consist of horses in pastures.
Within each group, one horse is randomly assigned to an high alfalfa diet and one is
fed a low alfalfa diet. The study is:
(a) a randomized block design
(b) a matched pairs design
(c) a completely randomized design
5. A study attempts to compare two sunscreens. Each of 50 subjects with varying skin
complexions will use both sunscreens—Screen A on one side of the body and Screen B
on the other side. For each subject, a coin is tossed to determine which side receives
Screen A and which receives Screen B. Researchers measure the amount of ultraviolet
light exposure over both treated areas for each subject. This is an example of:
6. For his Statistics class experiment, researcher J. Gilbert decided to study how parents’
income affects children’s performance on standardized tests like the SAT. He proposed
to collect information from a random sample of test takers and examine the relationship
between parental income and SAT score.
7. In 2002, the journal Science reported that a study of women in Finland indicated that
having sons shortened the life spans of mothers by about 34 weeks per son, but that
daughters helped to lengthen the mothers’ lives. The data came from church records
from the period 1640 to 1870.
8. Some people claim they can get relief from migraine headache pain by drinking a large
glass of ice water. Researchers plan to enlist several people who suffer from migraines
in a test. When a participant experiences a migraine headache, he or she will take a
pill that may be a standard pain reliever or a placebo. Half of each group will also
drink ice water. Participants will then report the level of pain relief they experience.
10. In a large Introductory Statistics lecture hall, the professor reports that 55% of the
students enrolled have never taken a Calculus course, 32% have taken only one semester
of Calculus, and the rest have taken two or more semesters of Calculus. The professor
randomly assigns students to groups of three to work on a project of the course. What
is the probability that the first group-mate you meet has studied
11. Continuation. What is the probability that of your other two group-mates,
12. A certain bowler can bowl a strike 70% of the time. If the bowls are independent,
what’s the probability that she
13. A check of dorms revealed that 38% had refrigerators, 52% had TV’s and 21% had
both a TV and a refrigerator. What’s the probability that a randomly selected dorm
room has:
(a) P(US)
(b) Probability that a person completed education before college? Do not include
those who did not answer.
(c) Probability that a person is from France or did post graduate study.
(d) Probability that a person is from France and finished primary school.
15. An animal shelter states that it currently has 24 dogs and 18 cats available for adoption.
8 of the dog and 6 of the cats are male. Find the conditional probability of:
16. Followup. The local animal shelter in reported that it currently has 24 dogs and 18
cats available for adoption; 8 of the dogs and 6 of the cats are male. Are being male
and being a dog independent events? Briefy justify your answer.
17. Police setup checkpoints to catch drunk drivers. Based on the initial stop, trained
officers can make the right decision 80% of the time. Suppose a checkpoint is set up at
a time when it is estimated that about 12% of people have been drinking. Questions
to answer:
(a) Suppose a person is stopped and is not drinking. What is the probability that he
is detained for further testing?
(b) What’s the probability that any given driver will be detained?
(c) What’s the probability that a driver who is detained has actually been drinking?
(d) What’s the probability that a driver who was released had actually been drinking?
18. A company’s records indicate that on any given day about 1% of their day-shift employ-
ees and 2% of the night-shift employees will miss work. Sixty percent of the employees
work the day shift. What percent of employees are absent on any given day?
19. We are given the following distribution for X.
X 3 5 6 8 10
P(X = x) 0.2 0.1 0.3 0.3
(a) What is the value of the missing probability in the table above?
(a) Standardize x = 9
(b) Standardize x = 21
(c) Which of the two above is most unusual?
(a) Using the model described above, draw the model showing what the 68-95-99.7
Rule predicts.
(b) In what interval would you expect the central 99.7% of values to be found?
(c) What percent of values are above 50?
(d) What percent of values are between 40 and 60?
(e) What percent of values are between 40 and 50?
(f) What percent of values are between 50 and 60?
(g) What percent of values are between 45 and 50?
(h) What percent of values are between 50 and 65?
(i) What percent of values are above 60?
(j) What percent of values are below 45?
(k) What percent of values are between 40 and 65?
(l) What percent of values are between 45 and 60?
22. Based on the Normal Model with a mean of 50 and standard deviation of 5, answer
the following questions.
2. (a) Response bias if the students answer and lie. Nonresponse Bias if they do not
respond at all.
(b) Voluntary Response Bias
(c) Undercoverage. This method leaves out a lot students.
3. (a) The group receiving the pill with inert ingredients will not experience the placebo
effect. This is fallse. They are given the placebo to induce the placebo effect so
they can be compared to the control goup.
(b) This experiment includes blocking. This is false. The individuals were not
grouped first by some property or condition.
(c) The number of factors in the experiment is two. This is false. There is one factor,
the medicine, with two levels.
(d) This study is single blind. This is false. Both sets of participants were blinded.
(e) This study is double blind. This is true. Both sets of participants were blinded.
(f) The group receiving the medicine is the control group. This is false, the group
receiving the placebo is the control group.
4. This is a block design, as the horses were separated into groups before the treatments
were applied.
5. This is a matched pairs design. The pairs consist of the two sides of the subjects’
bodies.
8. (a) Experiment
(b) There are 2 factors - pain reliever and water temp. The pain reliever has 2 levels -
pain reliever or placebo. The water temperature has 2 levels - ice water or regular
water. Total, there are 4 treatments.
(c) Explanatory variable: pain reliever and water temp. Response variable: level of
pain relief.
9. (a) Experiment
(b) There is 1 factor - type of exercise. This factor has 2 levels - static stretching and
trunk stabilization exercises. In total, there are 2 treatments.
(c) Explanatory variable: type of exercise. Response variable: time before the ath-
letes were able to return to sports.
10. We are given that
P(no calculus) = 0.55,
P(1 semester) = 0.32.
P(TV) = 0.52
P(Refrigerator) = 0.38
P(both) = P(TV and Refrigerator) = 0.21
Answers to questions:
18. Before we answer any questions, it may be useful to create a tree diagram.
t P (Day and Absent) = (0.6)(0.01) = 0.006
sen
Ab 1
0.0
No
tA
y bs
Da 0.9 ent
9
0.6 P (Day and Not Absent) = (0.6)(0.99) = 0.594
Question to answer: What percent of employees are absent on any given day? Need
to calculate P(Absent). This is the denominator of Bayes Rule.
P (Absent) = P (Absent | Day) P (Day) + P (Absent | Night) P (Night)
= (0.01)(0.6) + (0.02)(0.4)
= 0.014
= 1.4%.
19. (a) What is the value of the missing probability in the table above? The total proba-
bility must equal 1. Therefore, the missing value is then 1 − 0.2 − 0.1 − 0.3 − 0.3 =
0.1.
(a) z = 9−16
3
= −7
3
= −2.33.
21−16
(b) z = 3
= 53 = 1.67.
(c) x = 9 is more unusual.
21. Use a normal model with a mean of 50 and standard deviation of 5.
(a) For the 68-95-99.7 Rule, we will have the following points on the graph (not shown
here). µ − 3σ = 50 − 3(5) = 35µ − 2σ = 50 − 2(5) = 40µ − σ = 50 − 5 = 45µ =
50µ + σ = 50 + 5 = 55µ + 2σ = 50 + 2(5) = 60µ + 3σ = 50 + 3(5) = 65
(b) Between 35 and 65.
(c) 50%.
(d) 95%.
(e) 47.5%.
(f) 47.5%.
(g) 34%.
(h) 49.85%.
(i) 2.5%.
(j) 16%.
(k) Between 40 and 50 is 47.5%. Between 50 and 65 is 49.85%. Between 40 and 65 is
97.35%.
(l) Between 45 and 50 is 34%. Between 50 and 60 is 47.5%. Between 45 and 60 is
81.5%.
22. Based on the Normal Model N(50, 5), answer the following questions. Draw pictures
to help you see what is going on.
(a) 50%.
(b) Step 1: Standardize. z = 62−50
5
= 125
= 2.4. Step 2: Calculate value from a
calculator or a table. From calculator: normalcdf(2.4, 999) = 0.0082. Solution
=0.82%. From table: We want area(z > 2.4). The table gives area(z < 2.4) =
0.9918. Our answer is 1 − 0.9918 = 0.0082, which is 0.82%.
(c) Step 1: Standardize. z = 39−505
= −11
5
= −2.2. Step 2: Calculate value from a
calculator or a table. From calculator: normalcdf(−999, −2.2) = 0.0139. Solution
= 1.39%. From table: We want area(z < −2.2). The table gives us this directly
and the value is 0.0139. Solution = 1.39%.
(d) Step 1: Standardize. z = 43−50
5
= −75
= −1.4. Step 2: Calculate value from a
calculator or a table. From calculator: normalcdf(−1.4, 999) = 0.9192. Solution
= 91.92$. From table: We want area(z > −1.4). The table gives area(z <
−1.4) = 0.0808. Our answer is 1 − 0.0808 = 0.9192, which is 91.92%.
(e) Step 1: Standardize. z = 58−50 5
= 85 = 1.6. Step 2: Calculate value from a
calculator or a table. From calculator: normalcdf(−999, 1.6) = 0.9452. Solution
= 94.52$. From table: We want area(z < 1.6). The table gives us this directly
and the value is 0.9452. Solution = 94.52%.
(f) Step 1: Standardize both values. z1 = 37−50 5
= −2.6. z2 = 52−50
5
= 0.4. Step 2:
Calculate value from a calculator or a table. From calculator: normalcdf(−2.6, 0.4) =
0.6508. Solution = 65.08%. From table: We want area(−2.6 < z < 0.4). The
table gives area(z < −2.6) = 0.0047 and area(z < 0.4) = 0.6554. Our answer is
0.6554 − 0.0047 = 0.6507, which is 65.07%. The difference between the calculator
answer and the table answer is because of rounding. Show your work!
(g) Step 1: Standardize both values. z1 = 57.25−50
5
= 1.45. z2 = 66−50
5
= 3.2. Step 2:
Calculate value from a calculator or a table. From calculator: normalcdf(1.45, 3.2) =
0.0728. Solution = 7.28%. From table: We want area(1.45 < z < 3.2). The ta-
ble gives area(z < 1.45) = 0.9265 and area(z < 3.2) = 0.9993. Our answer is
0.9993 − 0.9265 = 0.0728, which is 7.28%.
23. Based on the Normal Model N(50, 5), answer the following questions. Draw pictures
to help you see what is going on.
(a) Highest 5% of values corresponds to a z-value of 1.645. Find this using invnorm(0.95)
on your calculator, or looking for the value on a table. Use the z-score formula to
solve for the value you are looking for. z = 1.645 = x−50
5
⇒ 8.225 = x − 50
⇒ x = 58.225.
(b) Lower 25% of values corresponds to a z-value of -0.67. Find this using invnorm(0.25)
on your calculator, or looking for the value on a table. Use the z-score formula to
solve for the value you are looking for. z = −0.67 = x−505
⇒ −3.35 = x − 50
⇒ x = 46.65.
(c) We need 2 values for z in this case. Let the lower value of z be zL and the upper
value of z be zR . Find these values by typing invnorm(0.15) and invnorm(0.85)
on your calculator. We have zL = −1.04. zR = 1.04 respectively. Solve for the
two values of x. zL = −1.04 = x−505
⇒ −5.2 = x − 50
⇒ xL = 44.8,
zR = 1.04 = x−50
5
⇒ 5.2 = x − 50
⇒ xR = 55.2.
Stat 130 Exam 3:
1
Important Formulas and Concepts
1 Chapter 12
1.1 Definitions
1. Binomial Distribution
A sequence of trials has a binomial distribution if
2. Success/Failure Condition
A Binomial Model is approximately Normal if we expect at least 10 successes and 10
failures, i.e. np ≥ 10 and n(1 − p) ≥ 10.
3. Poisson Distribution
A variable has a Poisson distribution if
µ = np
σ=
p
np(1 − p)
where
n n!
k
= k!(n−k)!
µ=µ
√
σ= µ
4. Sampling Error
Sample-to-sample variation
9. Critical Value
The number of standard errors to move away from the mean of the sampling distribu-
tion to correspond to the specified level of confidence. The critical value, for a normal
sampling distribution, denoted z ∗ , is usually found from a table or technology. The
critical value, for a t-distribution, denoted t∗ , is also found from a table or technology.
4. P-value
The probability of observing a value for a test statistic at least as far from the hy-
pothesized value as the statistic value actually observed if the null hypothesis is true.
A small p-value indicates either that the observation is improbable or that the proba-
bility calculation was based on incorrect assumptions. The assumed truth of the null
hypothesis is the assumption under suspicion.
8. One rejects H0 and accepts Ha and calls the results statistically significant if the P -
value is sufficiently small (less than α).
5 Chapter 15
1. Statistically significant
When the p-value falls below the alpha level, we say that the test is “statistically
significant” at that alpha level.
2. Alpha level
The threshold p-value that determines when we reject a null hypothesis. If we observe
a statistic whose p-value based on the null hypothesis is less than α, we reject that
null hypothesis.
3. Significance level
The alpha level is also called the significance level, most often in a phrase such as a
conclusion that a particular test is “significant at the 5% significance level”
4. Type I Error
The error of rejecting a null hypothesis when in fact it is true (also called a false
positive). The probability of a Type I Error is α.
5. Type II Error
The error of failing to reject a null hypothesis when in fact it is false (also called a false
negative). The probability of a Type II Error is β.
6. β
The probability of a Type II Error is commonly denoted β and depends on the effect
size.
7. Power
The probability that a hypothesis test will correctly reject a false null hypothesis is the
power of the test. For any specific value in the alternative, the power is 1 − β.
6 Chapter 17
1. Student’s t distribution
A family of distributions indexed by its degrees of freedom. The t-models are unimodal,
symmetric, and bell shaped, but have fatter tails and a narrower center than the
Normal model. As the degrees of freedom increase, t-distributions approah the Normal
distribution.
Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µ = µ0
HA : µ <or>or̸= µ0
Step 2: Calculate your test statistic
x−µ0
z= √σ
n
Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µ = µ0
HA : µ <or>or̸= µ0
Step 2: Calculate your test statistic
x−µ0
tn−1 = √s
n
Step 3: Calculate the p-value
Step 4: State your conclusion. If p-value < α, (usually α = 0.05), then Reject H0 . If
p-value is ≥ α, then Do Not Reject H0 .
Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µd = 0
HA : µd <or>or̸= ∆0
Step 2: Calculate your test statistic
¯
tn−1 = sdd
√
n
¯
d = averageofthedifferences
sd = standarddeviationofthedifferences
Step 3: Calculate the p-value
Step 4: State your conclusion. If p-value ≤ α, (usually α = 0.05), then Reject H0 . If
p-value is > α, then Do Not Reject H0 .
8 Extra Information
Review any and all notes and supplementary materials. It may be the case that something
was accidentally omitted from this study guide. Also, review any problems that may have
been discussed in class as not all example problems may have been provided here.
9 Example Problems
1. A printing company ships boxes of paper to office stores. In each box, there are 30
reams of paper. However, in every box, they estimate that 2% of the reams of paper
are defective in some way. What is the probability that in a box, there will be exactly
4 reams of paper that need to be shipped back to the printing company? What is the
mean number of reams of paper that need to be shipped back? What is the standard
deviation?
2. Several factors are involved in the creation of a confidence interval. Among them are
the sample size, the level of confidence, and the margin of error. Which statements are
true?
(a) For a given sample size, higher confidence means a smaller margin of error.
(b) For a given confidence level, halving the margin of error requires a sample twice
as large.
(c) For a certain confidence level, you can get a smaller margin of error by selecting
a bigger sample.
(d) For a fixed margin of error, larger samples provide greater confidence.
3. A butcher wants to estimate the mean weight of a ham. She samples 33 hams and
computes a sample mean weight of 8.2 pounds. She knows the population standard
deviation is 3.3 pounds. What is a 90% confidence interval for the population mean
weight of ham? Please indicate the value you used for z ∗ or t∗ .
5. A computer professional wants to know the mean number of emails people receive each
day. She is going to compute a 95% confidence interval and wants a margin of error of
±2 emails. She believes the standard deviation to be 18 emails. How large should the
sample size be to ensure this margin of error?
6. A researcher believes that the mean age at which a person first votes is greater than 22
years. He samples 27 people and computes a sample mean of 24.3 years and a sample
standard deviation of 8 years.
8. A researcher believes that the mean height of a prairie dog is different than 14 inches.
She samples 31 prairie dogs and computes a sample mean of 15.8 inches and a sample
standard deviation of 3.6 inches.
(a) A very low P-value provides evidence against the null hypothesis.
(b) A high P-value is strong evidence in favor of the null hypothesis.
(c) A P-value above 0.10 shows that the null hypothesis is true.
(d) If the null hypothesis is true, you can’t get a p-value below 0.01.
10. Which of the following statements are true? If false, explain briefly.
(a) Using an alpha level of 0.05, a p-value of 0.04 results in rejecting the null hypoth-
esis.
(b) The alpha level depends on the sample size.
(c) With an alpha level of 0.01, a p-value of 0.10 results in rejecting the null hypoth-
esis.
(d) Using an alpha level of 0.05, a p-value of 0.06 means the null hypothesis is true.
11. For each of the following situations, state whether a Type I or Type II, or neither error
has been made. Explain briefly.
(a) A bank wants to know if the enrollment on their website is above 30% based on
a small sample of customers. they test H0 : p = 0.3 versus HA : p > 0.3 and
reject the null hypothesis. Later they find out that actually 28% of all customers
enrolled.
(b) A student tests 100 students to determine whether other students on her campus
prefer Coke or Pepsi and finds no evidence that preference for Coke is not 0.5.
Later, a marketing company tests all students on campus and finds no difference.
(c) A human resource analyst wants to know if the applicants this year score, on
average, higher on their placement exam than the 52.5 points the candidates
averaged last year. She samples 50 recent tests and finds the average to be 54.1
points. She fails to reject the null hypothesis that the mean is 52.5 points. At
the end of the year, they find that the candidates this year had a mean of 55.3
points.
(d) A pharmaceutical company tests whether a drug lifts the headache relief rate
from the 25% achieved by the placebo. They fail to reject the null hypothesis
because the p-value is 0.465. Further testing shows that the drug actually relieves
headaches in 38% of people.
12. We want to estimate the healing rate for a wound. A sample of size 17 is collected
and the sample mean is computed to be 24.3 micrometers per hour, with a sample
standard deviation of s= 8 micrometers per hour. What is a 95% confidence interval
for the population mean?
13. Teresa knows that appointment times are approximately normally distributed. She
believes the mean wait time is longer than 25 minutes. She conducts a test with α
= 0.05 and the appropriate hypotheses. She selects 25 random appointments and the
sample mean was found to be 25.66 minutes and a sample standard deviation of 10
minutes.
= 27405(0.024 )(0.9826 )
= 27405(1.6 × 10−7 )(0.5914)
= 0.0026
You can also calculate this probability on your calculator as binompdf (30, 0.02, 4) =
0.0026.
p
The
p mean is µ = np = 30(0.02) = 0.6. The standard deviation is σ = np(1 − p) =
30(0.02)(0.98) = 0.7668.
3. A butcher wants to estimate the mean weight of a ham. She samples 33 hams and
computes a sample mean weight of 8.2 pounds. She knows the population standard
deviation is 3.3 pounds. What is a 90% confidence interval for the population mean
weight of ham? Please indicate the value you used for z ∗ or t∗ .
Summary of what is given:
n = 33
x = 8.2
σ = 3.3.
For confidence intervals for the mean, we use z ∗ and 90% confidence (for this case).
Thus, z ∗ = 1.645.
CI : x ± z ∗ √σn
⇒ 8.2 ± 1.645 √3.3
33
⇒ (7.255, 9.145)
5. A computer professional wants to know the mean number of emails people receive each
day. She is going to compute a 95% confidence interval and wants a margin of error of
±2 emails. She believes the standard deviation to be 18 emails. How large should the
sample size be to ensure this margin of error? Summary of what is given:
M OE = 2
s = 18
For sample size calculation, since this is based on the mean, use t∗ , with n − 1 degrees
of freedom. Note that as n becomes really large, the t-distribution becomes more like
the normal distribution. Therefore, use the 95% confidence interval critical value from
the normal distribution instead. z ∗ = 1.96. Sample size can be calculated as follows:
M OE = t∗n−1 √sn ⇒ 2 = 1.96 √18n
2
⇒ 1.96×18 = √1n
√
⇒ n = 1.96×182
2
⇒ n = 1.96×18
2
⇒ n = 311.1696
⇒ n ≈ 312
6. A researcher believes that the mean age at which a person first votes is greater than 22
years. He samples 27 people and computes a sample mean of 24.3 years and a sample
standard deviation of 8 years.
8. A researcher believes that the mean height of a prairie dog is different than 14 inches.
She samples 31 prairie dogs and computes a sample mean of 15.8 inches and a sample
standard deviation of 3.6 inches.
(a) A very low P-value provides evidence against the null hypothesis.
True.
(b) A high P-value is strong evidence in favor of the null hypothesis.
False. A high p-value shows that the data are consistent with the null hypothesis
but does not prove that the null hypothesis is true.
(c) A P-value above 0.10 shows that the null hypothesis is true.
False. No p-value ever shows that the null hypothesis is true (or false).
(d) If the null hypothesis is true, you can’t get a p-value below 0.01.
False. If the null hypothesis is true, you will get a p-value below 0.01 about once
in a hundred hypothesis tests.
10. Which of the following statements are true? If false, explain briefly.
(a) Using an alpha level of 0.05, a p-value of 0.04 results in rejecting the null hypoth-
esis.
True.
(b) The alpha level depends on the sample size.
False. The alpha level is set independently and does not depend on the sample
size.
(c) With an alpha level of 0.01, a p-value of 0.10 results in rejecting the null hypoth-
esis.
False. The p-value would have to be less than 0.01 to reject the null hypothesis.
(d) Using an alpha level of 0.05, a p-value of 0.06 means the null hypothesis is true.
False. It means that we do not have enough evidence at that alpha level to reject
the null hypothesis.
11. For each of the following situations, state whether a Type I or Type II, or neither error
has been made. Explain briefly.
(a) A bank wants to know if the enrollment on their website is above 30% based on
a small sample of customers. they test H0 : p = 0.3 versus HA : p > 0.3 and
reject the null hypothesis. Later they find out that actually 28% of all customers
enrolled.
Type I Error. The actual value is not greater than 0.3, but they rejected the null
hypothesis.
(b) A student tests 100 students to determine whether other students on her campus
prefer Coke or Pepsi and finds no evidence that preference for Coke is not 0.5.
Later, a marketing company tests all students on campus and finds no difference.
No error. The actual value is 0.5 which was not rejected.
(c) A human resource analyst wants to know if the applicants this year score, on
average, higher on their placement exam than the 52.5 points the candidates
averaged last year. She samples 50 recent tests and finds the average to be 54.1
points. She fails to reject the null hypothesis that the mean is 52.5 points. At
the end of the year, they find that the candidates this year had a mean of 55.3
points.
Type II Error. The actual value was 55.3 points, which is greater than 52.5, which
was not rejected.
(d) A pharmaceutical company tests whether a drug lifts the headache relief rate
from the 25% achieved by the placebo. They fail to reject the null hypothesis
because the p-value is 0.465. Further testing shows that the drug actually relieves
headaches in 38% of people.
Type II Error. The null hypothesis was not rejected, but it was false. The true
relief rate was greater than 0.25.
12. We want to estimate the healing rate for a wound. A sample of size 17 is collected
and the sample mean is computed to be 24.3 micrometers per hour, with a sample
standard deviation of s= 8 micrometers per hour. What is a 95% confidence interval
for the population mean?
What we are given:
y = 24.3
s=8
n = 17
Because we want the confidence interval for the population mean, use the formula
CI : y ± t∗n−1 √sn
⇒ 24.3 ± t∗17−1 √817
⇒ 24.3 ± 2.120 √817
⇒ (20.187, 28.413)
13. Teresa knows that appointment times are approximately normally distributed. She
believes the mean wait time is longer than 25 minutes. She conducts a test with α
= 0.05 and the appropriate hypotheses. She selects 25 random appointments and the
sample mean was found to be 25.66 minutes and a sample standard deviation of 10
minutes.
1 Sampling Distributions
1.1 Definitions
1. Sampling Distribution
Different random samples give different values of a statistic. Distribution of the statis-
tics over all possible samples is called the sampling distribution. Sampling distribution
model shows the behavior of the statistic over all the possible samples for the same
size n.
5. Sampling Error
Sample-to-sample variation
2 Chapter 18
1. Two-sample t-interval for the difference between means
A confidence interval for the difference between the means of two p independent groups
is found as (y 1 − y 2 ) ± t∗df × SE(y 1 − y 2 ). Here, SE(y 1 − y 2 ) = (s21 /n1 ) + (s22 /n2 ),
and the number of degrees of freedom is given by a special formula or we use the
conservative method.
3 Chapter 19
1. Hypothesis
A model or proposition that we adopt in order to test.
4. P-value
The probability of observing a value for a test statistic at least as far from the hy-
pothesized value as the statistic value actually observed if the null hypothesis is true.
A small p-value indicates either that the observation is improbable or that the proba-
bility calculation was based on incorrect assumptions. The assumed truth of the null
hypothesis is the assumption under suspicion.
5. One-proportion Z-test
A test of the null hypothesis that the proportion of a single sample equals a specified
value H0 : p = p0 by referring the statistic z = (p̂ − p0 )/SD(p̂).
6. Two-sided (Tailed) Alternative
An alternative hypothesis is two-sided ( for example HA : p ̸= p0 ) when we are inter-
ested in deviations in either direction away from the hypothesized parameter value.
4 Chapter 20
1. Sampling distribution of the difference between two proportions
The sampling distribution of p̂1 − p̂2 is, under appropriate assumptions, modeled by
a Normal model with mean µ = p1 − p2 and standard deviation SD(p̂1 − p̂2 ) =
p
(p1 (1 − p1 ))/n1 + (p2 (1 − p2 ))/n2 .
2. Two-proportion z-interval
This is the confidence interval. A two-proportion z-interval gives a confidence interval
for the true difference in proportions, p1 −p2 in two independent groups. The confidence
interval is (p̂1 − p̂2 )±z ∗ ×SE(p̂1 − p̂2 ). z ∗ is the critical value from the standard Normal
Model corresponding to the specified confidence level.
3. Two-proportion z-test
This is the hypothesis test. Test the null hypothesis H0 : p1 − p2 = 0 by comparing
the statistic z = (p̂1 − p̂2 )/SEpooled (p̂1 − p̂2 ) to the standard normal model.
Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µ1 − µ2 = ∆0
HA : µ1 − µ2 <or>or̸= ∆0
Step 2: Calculate your test statistic
(y 1 −y 2 )−∆0
tdf = r
s2 2
1 + s2
n1 n2
Hypothesis Testing
Step 1: Write down your hypothesis
H0 : µd = ∆0
HA : µd <or>or̸= ∆0
Step 2: Calculate your test statistic
¯
tn−1 = d−∆
s
√d
0
d¯ = averageofthedifferences
sd = standarddeviationofthedifferences
Step 3: Calculate the p-value
Step 4: State your conclusion. If p-value ≤ α, (usually α = 0.05), then Reject H0 . If
p-value is > α, then Do Not Reject H0 .
6 Extra Information
Review any and all notes and supplementary materials. It may be the case that something
was accidentally omitted from this study guide. Also, review any problems that may have
been discussed in class as not all example problems may have been provided here.
7 Example Problems
1. The 95% confidence interval for the number of teens who reported that they had
misrepresented their age online is from 45.6% to 52.5%. There were 799 teens in this
study.
2. A study found that 16 of 40 peanut candy bars in fact did not contain peanuts.
3. Several factors are involved in the creation of a confidence interval. Among them are
the sample size, the level of confidence, and the margin of error. Which statements are
true?
(a) For a given sample size, higher confidence means a smaller margin of error.
(b) For a given confidence level, halving the margin of error requires a sample twice
as large.
(c) For a certain confidence level, you can get a smaller margin of error by selecting
a bigger sample.
(d) For a fixed margin of error, larger samples provide greater confidence.
4. I sample 600 people and 432 of them like cats. Construct a 95% confidence interval
for the population proportion.
5. I think the proportion of people that eat candy is around 0.75. I am going to construct
a 90% confidence interval and want the margin of error to be ±0.025. How large should
the sample size be?
6. Jimmy samples 930 people and 234 took public transportation. Construct a 99%
confidence interval for the population proportion.
7. I am going to construct a 95% confidence interval for the proportion of people that
wear eyeglasses and want the margin of error to be ±0.2. I have no idea what to
estimate for the population proportion. How large should the sample size be?
8. A researcher believes that more than 50% of all people voted in the last election. She
samples 800 people and 420 of them voted. Test her claim at a significance level of
0.05 (i.e. compare the P-value to 0.05).
9. A researcher believes that fewer than 75% of all mollusks are tasty. He samples 1200
mollusks and 865 of them are tasty. Test his claim at a significance level of 0.05 (i.e.
compare the P-value to 0.05).
(a) A very low P-value provides evidence against the null hypothesis.
(b) A high P-value is strong evidence in favor of the null hypothesis.
(c) A P-value above 0.10 shows that the null hypothesis is true.
(d) If the null hypothesis is true, you can’t get a p-value below 0.01.
18. Which of the following statements are true? If false, explain briefly.
(a) Using an alpha level of 0.05, a p-value of 0.04 results in rejecting the null hypoth-
esis.
(b) The alpha level depends on the sample size.
(c) With an alpha level of 0.01, a p-value of 0.10 results in rejecting the null hypoth-
esis.
(d) Using an alpha level of 0.05, a p-value of 0.06 means the null hypothesis is true.
19. For each of the following situations, state whether a Type I or Type II, or neither error
has been made. Explain briefly.
(a) A bank wants to know if the enrollment on their website is above 30% based on
a small sample of customers. they test H0 : p = 0.3 versus HA : p > 0.3 and
reject the null hypothesis. Later they find out that actually 28% of all customers
enrolled.
(b) A student tests 100 students to determine whether other students on her campus
prefer Coke or Pepsi and finds no evidence that preference for Coke is not 0.5.
Later, a marketing company tests all students on campus and finds no difference.
(c) A human resource analyst wants to know if the applicants this year score, on
average, higher on their placement exam than the 52.5 points the candidates
averaged last year. She samples 50 recent tests and finds the average to be 54.1
points. She fails to reject the null hypothesis that the mean is 52.5 points. At
the end of the year, they find that the candidates this year had a mean of 55.3
points.
(d) A pharmaceutical company tests whether a drug lifts the headache relief rate
from the 25% achieved by the placebo. They fail to reject the null hypothesis
because the p-value is 0.465. Further testing shows that the drug actually relieves
headaches in 38% of people.
20. A researcher samples 600 children and 500 of them like ice cream. She also samples
450 adults and 350 of them like ice cream. Construct a 95% confidence interval for the
difference of population proportions of children and adults that like ice cream.
21. A researcher samples 1200 children and 500 of them like to exercise. She also samples
900 adults and 350 of them like to exercise. Construct a 90% confidence interval for
the difference of population proportions of children and adults that like to exercise.
22. A scientist believes that the proportion of North American bees that are hostile is
greater than the proportion of South American bees. She samples 500 North American
bees and 200 are hostile. She samples 600 South American bees and 230 are hostile.
23. A scientist believes that the proportion of North American bears that are hostile is
greater than the proportion of South American bears. She samples 800 North American
bears and 200 are hostile. She samples 1200 South American bears and 240 are hostile.
(a) Find a 95% confidence interval for the difference in average commuting time for
the two routes. Use df= 33.
(b) State the hypotheses to be tested.
(c) Compute the value of the test score.
(d) Give the P-value or range of P-values.
(e) Do the results seem significant?
25. Researchers randomly assigned participants either a tall, thin “highball” glass or a
short, wide “tumbler,” each of which held 355 ml. Participants were asked to pour 1.5
oz = 44.3 ml of water into their glass. Did the shape of the glass make a difference in
how much liquid they poured? In particular, test to see if they poured less water into
the “highball” glass than the “tumbler”. Assume α = 0.1. Here are the summaries:
Highball Tumbler
n 99 n 99
y 42.2 ml y 60.9 ml
s 16.2 ml s 17.9 ml
(a) Find a 90% confidence interval for the difference in average water held for the two
glasses. Use df = 194.
(b) State the hypotheses to be tested.
(c) Compute the value of the test score. (Assume all conditions are met.)
(d) Give the P-value or range of P-values.
(e) Do the results seem significant?
7.1 Various Chapters
1. We want to estimate the healing rate for a wound. A sample of size 17 is collected
and the sample mean is computed to be 24.3 micrometers per hour, with a sample
standard deviation of s= 8 micrometers per hour. What is a 95% confidence interval
for the population mean?
2. A sample of size n=150 people is collected and the sample proportion of people who are
illiterate is computed to be .20. Compute a 95% confidence interval for the population
proportion of illiterate people.
3. You believe that the proportion of people that like cheese is .80. You are going to
construct a 95% confidence interval and want the margin of error to be plus or minus
.03. What should the sample size be?
4. Teresa knows that appointment times are approximately normally distributed. She
believes the mean wait time is longer than 25 minutes. She conducts a test with α
= 0.05 and the appropriate hypotheses. She selects 25 random appointments and the
sample mean was found to be 25.66 minutes and a sample standard deviation of 10
minutes.
5. You claim that the proportion of people who watch American Idol is greater than .50.
You sample n=200 people and compute a sample proportion of .53. Assume α = 0.05.
6. You want to compare the proportion of gamers amongst women and men. You survey
300 women and 400 men. 175 of the women were gamers and 200 of the men were
gamers. Construct a 95% confidence interval for the difference of proportions.
7. You believe that the proportion of men that are colorblind is greater than the pro-
portion of women that are color blind. You sample 900 men and 90 of them are color
blind. You sample 700 women and 45 of them are colorblind. Assume α = 0.05.
(a) For a 90% confidence interval, z ∗ = 1.645. The 90% confidence interval would
then be
q
p̂(1−p̂)
p̂ ± z ∗ n
q 16 16
16 ( 40 )(1− 40 )
= ± 1.645
40
16
√ 40
= ± 1.645 0.006
40
= (0.2726, 0.5274)
(b) We are 90% confidence that between 27% and 53% of all peanut candy bars did
not contain peanuts.
(c) For a 95% confidence interval, z ∗ = 1.96. The 95% confidence interval would then
be q
p̂(1−p̂)
p̂ ± z ∗ n
q 16 16
16 ( 40 )(1− 40 )
= ± 1.96
40
16
√ 40
= ± 1.96 0.006
40
= (0.2482, 0.5518)
(d) We are 95% confident that between 25% and 55% of all peanut candy bars did
not contain peanuts.
4. I sample 600 people and 432 of them like cats. Construct a 95% confidence interval
for the population proportion.
432
p̂ = 600 = 0.72
z ∗ = 1.96
n = 600 q
CI : p̂ ± z ∗ p̂(1−p̂)
qn
⇒ 0.72 ± 1.96 0.72(1−0.72)
600
⇒ (0.684, 0.756)
5. I think the proportion of people that eat candy is around 0.75. I am going to construct
a 90% confidence interval and want the margin of error to be ±0.025. How large should
the sample size be?
p̂ = 0.75
z ∗ = 1.645
M OE = 0.025 q
M OE = z ∗ p̂(1−p̂) n
q
⇒ 0.025 = 1.645 0.75(1−0.75)n
q
0.025 0.1875
⇒ 1.645 = n
0.025 2 0.1875
⇒ 1.645 = n
⇒ n = 0.1875 2
( 0.025
1.645 )
⇒ n = 811.8075
⇒ n ≈ 812
6. Jimmy samples 930 people and 234 took public transportation. Construct a 99%
confidence interval for the population proportion.
p̂ = 234
930
z ∗ = 2.576
n = 930 q
CI : p̂ ± z ∗ p̂(1−p̂)
qn
⇒ 234
930
± 2.576 (234/930)(1−234/930)
930
q
0.188
⇒ 0.252 ± 2.576 930
⇒ (0.215, 0.289)
7. I am going to construct a 95% confidence interval for the proportion of people that
wear eyeglasses and want the margin of error to be ±0.2. I have no idea what to
estimate for the population proportion. How large should the sample size be?
p̂ = 0.5whenwedon′ thaveanyideaforthepopulationproportion
z∗ = 1.96
MOE = 0.2q
MOE = z∗ p̂(1−p̂)
qn
⇒ 0.2 = 1.96 0.5(1−0.5)n
q
0.2
⇒ 1.96 = 0.25 n
0.2 2 0.25
⇒ 1.96 = n
⇒ n = 0.25 0.2 2
( 1.96 )
⇒ n = 24.01
⇒ n ≈ 25
8. A researcher believes that more than 50% of all people voted in the last election. She
samples 800 people and 420 of them voted. Test her claim at a significance level of
0.05 (i.e. compare the P-value to 0.05).
9. A researcher believes that fewer than 75% of all mollusks are tasty. He samples 1200
mollusks and 865 of them are tasty. Test his claim at a significance level of 0.05 (i.e.
compare the P-value to 0.05).
10. A researcher believes that the percentage of people that watch Game of Thrones is
different than 27%. He samples 900 people and 220 of them watch. Test his claim at
a significance level of 0.05 (i.e. compare the P-value to 0.05).
11. A butcher wants to estimate the mean weight of a ham. She samples 33 hams and
computes a sample mean weight of 8.2 pounds and a sample standard deviation of 3.3
pounds. What is a 90% confidence interval for the population mean weight of ham?
Please indicate the value you used for z ∗ or t∗ .
Summary of what is given:
n = 33
y = 8.2
s = 3.3.
For confidence intervals for the mean, we use t∗ , with n − 1 degrees of freedom and
90% confidence (for this case). Thus, t∗32 = 1.694.
CI : y ± t∗n−1 √sn
⇒ 8.2 ± 1.694 √3.333
⇒ (7.227, 9.173)
14. A researcher believes that the mean age at which a person first votes is greater than 22
years. He samples 27 people and computes a sample mean of 24.3 years and a sample
standard deviation of 8 years.
16. A researcher believes that the mean height of a prairie dog is different than 14 inches.
She samples 31 prairie dogs and computes a sample mean of 15.8 inches and a sample
standard deviation of 3.6 inches.
(a) A very low P-value provides evidence against the null hypothesis.
True.
(b) A high P-value is strong evidence in favor of the null hypothesis.
False. A high p-value shows that the data are consistent with the null hypothesis
but does not prove that the null hypothesis is true.
(c) A P-value above 0.10 shows that the null hypothesis is true.
False. No p-value ever shows that the null hypothesis is true (or false).
(d) If the null hypothesis is true, you can’t get a p-value below 0.01.
False. If the null hypothesis is true, you will get a p-value below 0.01 about once
in a hundred hypothesis tests.
18. Which of the following statements are true? If false, explain briefly.
(a) Using an alpha level of 0.05, a p-value of 0.04 results in rejecting the null hypoth-
esis.
True.
(b) The alpha level depends on the sample size.
False. The alpha level is set independently and does not depend on the sample
size.
(c) With an alpha level of 0.01, a p-value of 0.10 results in rejecting the null hypoth-
esis.
False. The p-value would have to be less than 0.01 to reject the null hypothesis.
(d) Using an alpha level of 0.05, a p-value of 0.06 means the null hypothesis is true.
False. It means that we do not have enough evidence at that alpha level to reject
the null hypothesis.
19. For each of the following situations, state whether a Type I or Type II, or neither error
has been made. Explain briefly.
(a) A bank wants to know if the enrollment on their website is above 30% based on
a small sample of customers. they test H0 : p = 0.3 versus HA : p > 0.3 and
reject the null hypothesis. Later they find out that actually 28% of all customers
enrolled.
Type I Error. The actual value is not greater than 0.3, but they rejected the null
hypothesis.
(b) A student tests 100 students to determine whether other students on her campus
prefer Coke or Pepsi and finds no evidence that preference for Coke is not 0.5.
Later, a marketing company tests all students on campus and finds no difference.
No error. The actual value is 0.5 which was not rejected.
(c) A human resource analyst wants to know if the applicants this year score, on
average, higher on their placement exam than the 52.5 points the candidates
averaged last year. She samples 50 recent tests and finds the average to be 54.1
points. She fails to reject the null hypothesis that the mean is 52.5 points. At
the end of the year, they find that the candidates this year had a mean of 55.3
points.
Type II Error. The actual value was 55.3 points, which is greater than 52.5, which
was not rejected.
(d) A pharmaceutical company tests whether a drug lifts the headache relief rate
from the 25% achieved by the placebo. They fail to reject the null hypothesis
because the p-value is 0.465. Further testing shows that the drug actually relieves
headaches in 38% of people.
Type II Error. The null hypothesis was not rejected, but it was false. The true
relief rate was greater than 0.25.
20. A researcher samples 600 children and 500 of them like ice cream. She also samples
450 adults and 350 of them like ice cream. Construct a 95% confidence interval for the
difference of population proportions of children and adults that like ice cream.
What we are given:
500
p̂1 = 600
350
p̂2 = 450
Since we are considering the confidence interval for the difference of proportions, we
need a value for z ∗ . Here, z ∗ = 1.96. The confidence interval is
q
p̂1 (1−p̂1 )
CI: (p̂1 − p̂2 ) ± z ∗
n1
+ p̂2 (1−p̂
n2
2)
⇒ 361
± 1.645 35/1441200
+ 77/324
900
⇒ (−0.0078, 0.0633)
22. A scientist believes that the proportion of North American bees that are hostile is
greater than the proportion of South American bees. She samples 500 North American
bees and 200 are hostile. She samples 600 South American bees and 230 are hostile.
24. A man who moves to a new city sees that there are two routes he could take to work.
A neighbor who has lived there a long time tells him Route A will average 5 minutes
faster than Route B. The man decides to experiment; he wants to find out if the mean
difference between Route A and B is different from 5 minutes. Each day, he flips a coin
to determine which way to go, driving each route 20 days. He finds that Route A takes
an average of 40 minutes, with a standard deviation of 3 minutes, and Route B takes
an average of 43 minutes, with a standard deviation of 2 minutes. Histograms of travel
times for the routes are roughly symmetric and show no outliers. Assume α = 0.05.
(a) Find a 95% confidence interval for the difference in average commuting time for
the two routes. Use df= 33.
Since df = 33, then t∗33 = 2.0345.
q 2
∗ s s2
CI:(y B − y A ) ± tdf nBB + nAA
q
22 32
⇒ (43 − 40) ± 2.0345 20 + 20
√
⇒ 3 ± 2.0345 0.65
⇒ (1.36, 4.64)
Note that this result means that we are 95% confident that Route B has a mean
commuting time between 1.36 and 4.64 minutes more than the mean commuting
time of Route A. Also, because 5 minutes is not within the interval, it appears
that the neighbor may be exaggerating the average difference in commuting time.
(b) State the hypotheses to be tested.
H0 : µB − µA = 5
HA : µB − µA ̸= 5
(c) Compute the value of the test score.
(y B −y A )−∆0 (43−40)−5
t33 = r = q = √−2 = −2.481
s2 s2 22 2
+ 320 0.65
B + nA 20
nB A
25. Researchers randomly assigned participants either a tall, thin “highball” glass or a
short, wide “tumbler,” each of which held 355 ml. Participants were asked to pour 1.5
oz = 44.3 ml of water into their glass. Did the shape of the glass make a difference in
how much liquid they poured? In particular, test to see if they poured less water into
the “highball” glass than the “tumbler”. Assume α = 0.1. Here are the summaries:
Highball Tumbler
n 99 n 99
y 42.2 ml y 60.9 ml
s 16.2 ml s 17.9 ml
(a) Find a 90% confidence interval for the difference in average water held for the two
glasses. Use df = 194.
Because we are looking for a 90% confidence interval with df = 194, t∗194 = 1.6528.
q 2
∗ s s2
CI: (y H − y T ) ± tdf nHH + nTT
q
2 2
⇒ (42.2 − 60.9) ± 1.6528 16.2 99
+ 17.9
99
√
⇒ −18.7 ± 1.6528 5.8874
⇒ −18.7 ± 1.6528(2.4264)
⇒ (−22.71, −14.69)
(b) State the hypotheses to be tested.
H0 : µH − µT = 0
HA : µH − µT < 0
(c) Compute the value of the test score. (Assume all conditions are met.)
(y H −y T )−0
t194 = r
s2 s2
H + nT
nH T
= q 42.2−60.9
16.22 2
99
+ 17.9
99
−18.7
= 2.4264
= −7.707
(d) Give the P-value or range of P-values.
On your calculator: tcdf (−999, −7.707, 194) = 0.
On the table (or an online table): go to degrees of freedom 194, find where 7.707
is in the row, and then look at the one-tail probability values. Compare 7.707 to
the values on the table. The p-value is less than 0.001.
(e) Do the results seem significant?
Since the p-value is “small” (0 < α = 0.1), Reject H0 . The results are signifi-
cant. There is sufficient evidence to conclude that they poured less water into the
“highball” glass than the “tumbler”.
2. A sample of size n=150 people is collected and the sample proportion of people who are
illiterate is computed to be .20. Compute a 95% confidence interval for the population
proportion of illiterate people.
What we are given:
n = 150
p̂ = 0.2
Because we want the confidence interval for the population proportion, use the formula
q
∗ p̂(1−p̂)
CI: p̂ ± z
qn
⇒ 0.2 ± 1.96 0.2(1−0.2) 150
q
⇒ 0.2 ± 1.96 0.16
150
⇒ (0.136, 0.264)
3. You believe that the proportion of people that like cheese is .80. You are going to
construct a 95% confidence interval and want the margin of error to be plus or minus
.03. What should the sample size be?
What we are given:
p̂ = 0.8
M OE = 0.03
Since this is dealing with one proportion, use the formula
q q
p̂(1−p̂)
M OE = z ∗
n
⇒ 0.03 = 1.96 0.8(1−0.8)
n
q
0.03 0.16
⇒ 1.96 = n
0.03 2 0.16
⇒ 1.96 = n
0.16
⇒ n = 0.03 2
( 1.96 )
⇒ n = 682.95
⇒ n ≈ 683
4. Teresa knows that appointment times are approximately normally distributed. She
believes the mean wait time is longer than 25 minutes. She conducts a test with α
= 0.05 and the appropriate hypotheses. She selects 25 random appointments and the
sample mean was found to be 25.66 minutes and a sample standard deviation of 10
minutes.
5. You claim that the proportion of people who watch American Idol is greater than .50.
You sample n=200 people and compute a sample proportion of .53. Assume α = 0.05.
6. You want to compare the proportion of gamers amongst women and men. You survey
300 women and 400 men. 175 of the women were gamers and 200 of the men were
gamers. Construct a 95% confidence interval for the difference of proportions.
What we are given:
175
p̂W = 300
200
p̂M = 400
Because we want the confidence interval for the difference of proportions, use the
formula
q
p̂1 (1−p̂1 )
CI: (p̂1 − p̂2 ) ± z ∗
+ p̂2 (1−p̂ 2)
qn1 175 n2
( 300 )(1− 175
300 ) ( 200 )(1− 200 )
⇒ 175 − 200
300 400q
± 1.96 300
+ 400 400 400
35
1
⇒ 12 ± 1.96 300 144
+ 0.25
400
⇒ (0.0091, 0.157)
7. You believe that the proportion of men that are colorblind is greater than the pro-
portion of women that are color blind. You sample 900 men and 90 of them are color
blind. You sample 700 women and 45 of them are colorblind. Assume α = 0.05.
= 1.9574 × 10−4
= 0.014 90 45
p̂1 −p̂2 − 700 1/28
z = SEpooled (p̂1 −p̂2 )
= 900
0.014
= 0.014 = 2.55
(c) Did you use the pooled proportion in part b.?
YES
(d) Compute the P-value.
On your calculator: normalcdf (2.55, 999) = 0.0054
(e) Are the results significant?
Since the p-value is “small” (0.0054 < α = 0.05), Reject H0 . The results are
significant.