QSCI 381 Lecture 7

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

Lecture 7.

Inference for numerical data:


one-sample t-tests and paired t-tests

Textbook Sections:
7.1 One-sample means with the t-distribution
7.2 Paired data

Lecture 7 Practice Problems


Depending on the properties of Ha, we will use a:
one-tailed (right or left tail) or two-tailed test.

Process of testing a statistical hypothesis:


(1) Calculate a test statistic based on the data
(2) Compare the test statistic to a critical value of the
test statistic from a theoretical distribution
(3) The critical value of the test statistic is based upon
the level of significance (a) and degrees of freedom
(which is based on sample size, n)
(4) When comparing the test statistic to a critical value of
the test statistic, the resulting p-value (probability
value) informs us of the statistical decision, from
which we can interpret the statistical decision in the
context of the claim
Statistical Decisions:

Reject the null hypothesis


OR
Fail to reject the null hypothesis

p-values ≤ a indicate a significant difference and a conclusion


to reject the null hypothesis.

p-values > a indicate there is no significant difference and a


conclusion to fail to reject the null hypothesis
Example: A coffee stand in Seattle does an average of $15,000 in
net sales per month, with a standard deviation of $2,000. Assume
the monthly sales are normally distributed. Last month, the coffee
stand netted $10,500. A claim is made that the sales from last
month are statistically lower than the monthly average. Test the
claim at a = 0.05.
Example: A coffee stand in Seattle does an average of $15,000 in
net sales per month, with a standard deviation of $2,000. Assume
the monthly sales are normally distributed. Last month, the coffee
stand netted $10,500. A claim is made that the sales from last
month are statistically lower than the monthly average. Test the
claim at a = 0.05.
Ho: Last month, sales were $15,000
Ha: Last month, sales were $15,000

Distribution of
monthly sales
Using a left-tailed test, with a mean of
observations in this tail $15,000 and a SD
would be statistically of $2,000
significant (assuming a
normal
distribution)

$15,000
Using the concept of the z-score and pnorm() in R:
What is the probability of a monthly sales of $10,500
given a mean = $15,000 and SD = $2,000?
Lecture 7. Inference for numerical data:
one-sample t-tests and paired t-tests

Textbook Sections:
7.1 One-sample means with the t-distribution
7.2 Paired data
Weight of ~3.95
million babies
born in the U.S.
Number of babies

(2018)

Weight of 120
babies born in a
local hospital
(2018)

Weight at birth (grams)


One sample hypothesis testing with the t-statistic

test statistic t = (sample mean – hypothesized


mean) SD /

Where: SD / = Standard Error (SE)

Then, compare test statistic t with critical t:


critical t = From t distribution probability table for
a given a and df based on sample
size
Depending on the properties of Ha, we will use a:
one-tailed (right or left tail) or two-tailed test.

Based on the data, we estimate a test statistic t, and compare


the test statistic t to a critical value of t based upon our stated
level of significance (a).

The resulting p-value (probability value) informs us if we should


reject or fail to reject Ho at a given a. For example, if a = 0.05:

p-values ≤ 0.05 indicate a significant difference and a


conclusion to reject the null hypothesis.

p-values > 0.05 indicate no significant differences and a


conclusion to fail to reject the null hypothesis
Problem 1
An inventor has developed a new, energy-efficient electric lawn
mower engine. She claims that the engine will run continuously
for at least 300 minutes on a single battery charge.

1) What is Ho and Ha? Is this a one-tailed or two-tailed test? If


one-tailed, is is left or right tailed?
2) What is our test statistic t and df?
3) When testing the claim at alpha = 0.05, what is the statistical
conclusion and interpretation?
Problem 1
An inventor has developed a new, energy-efficient electric lawn
mower engine. She claims that the engine will run continuously
for at least 300 minutes on a single battery charge.

1) What is Ho and Ha? Is this a one-tailed or two-tailed test? If


one-tailed, is is left or right tailed?
Problem 1
An inventor has developed a new, energy-efficient electric lawn
mower engine. She claims that the engine will run continuously
for at least 300 minutes on a single battery charge.

2) What is our test statistic t and df?


3) When testing the claim at alpha = 0.05, what is the statistical
conclusion and interpretation?

Use R/RStudio
Problem 1
An inventor has developed a new, energy-efficient electric lawn
mower engine. She claims that the engine will run continuously
for at least 300 minutes on a single battery charge.

A trial of 20 tests is conducted. Here are the data (in minutes):

runtime <- c(275.2, 321.2, 307.2, 292.4, 295.1, 296.3, 297.3,


298.5, 283.4, 316.8, 270.2, 301.5, 287.5, 312.4, 250.9, 300.6,
329.2, 304.5, 261.2, 305.6)
Problem 1
An inventor has developed a new, energy-efficient electric lawn
mower engine. She claims that the engine will run continuously
for at least 300 minutes on a single battery charge.

t.test() in R/RStudio

t.test(dataset, mu = hypothesized mean, alternative=c("test"))

"test" is:

"less" (for a left-tailed test; Ha contains <)


"greater" (for a right-tailed test; Ha contains >)
"two.sided" (for a two-tailed test; Ha contains )
Problem 1
An inventor has developed a new, energy-efficient electric lawn
mower engine. She claims that the engine will run continuously
for at least 300 minutes on a single battery charge.

2) What is our test statistic t and df?


3) When testing the claim at alpha = 0.05, what is the statistical
conclusion and interpretation?
Lecture 7. Inference for numerical data:
one-sample t-tests and paired t-tests

Textbook Sections:
7.1 One-sample means with the t-distribution
7.2 Paired data
Independent vs Dependent Samples

To date in the course, we have largely considered INDEPENDENT


samples that are collected at random and without bias. This is most
often the desired situation in most statistical analyses.

However, there are cases, depending on the study or claim, where


samples are DEPENDENT

Teaching Method Study (circa 1960s)


 Divided elementary age students into two groups
 One group was taught using a conventional method
 One group was taught using a new method
 At the end of the study, the students were tested to see
if the new method resulted in better scores
WHERE IS THE FLAW IN THIS DESIGN?
Examples of Dependent Samples
(also known as matched or paired data)

Teaching Method Example: pre-test vs post-test score for each student

Resting heart rate before and after starting a new fitness regime

Cholesterol level before and after taking a new drug

Height of a tree last year vs. this year

Because the second measurement (i.e., post-test score) can


be influenced, or dependent on, the first measurement
(i.e., pre-test score), the samples are considered to be
dependent and must be analyzed differently than samples
that are independent.
A fitness coach claims that a strength development
program will increase performance on the bench press (in
terms of the maximum lift in pounds).

What is Ho and Ha? What kind of test is this?

In a paired t-test, we construct Ho and Ha around , where

= the mean difference before and after, or

= measurement before class – measurement after class


(pre – post)
A fitness coach claims that a strength development
program will increase performance on the bench press (in
terms of the maximum lift in pounds).

What is Ho and Ha? What kind of test is this?

Claim: bench press weight after taking the class > bench
press weight before taking the class; The claim suggests the
“after” weight is larger.

Thus, the claim is is < 0


Ho: ≥ 0
Ha: < 0

This is a left-tailed test


A fitness coach claims that a strength development
program will increase performance on the bench press (in
terms of the maximum lift in pounds).

What is the test statistic t in a paired t-test?

test statistic

Let's use R/RStudio instead


A fitness coach claims that a strength development
program will increase performance on the bench press (in
terms of the maximum lift in pounds).

t.test(pre dataset, post dataset, alternative=c("test"), paired=TRUE)

"test" is:

"less" (for a left-tailed test; Ha contains <)


"greater" (for a right-tailed test; Ha contains >)
"two.sided" (for a two-tailed test; Ha contains )
Problem 2
A dietician claims eating a new cereal will lower total blood cholesterol. A
random sample of 7 adults is used to measure total blood cholesterol, in
milligrams per deciliter of blood), before eating the new cereal and after
eating the new cereal for one year. Here are the data (in order by subject):

Total blood cholesterol before eating the new cereal:


before <- c(210, 225, 240, 250, 255, 270, 235)
Total blood cholesterol after eating the new cereal:
after <- c(200, 220, 245, 248, 252, 268, 232)

1) What is Ho, Ha, and what kind of test is this?


2) At a = 0.1, what is the statistical decision?
3) What is the interpretation?
Problem 2
A dietician claims eating a new cereal will lower total blood cholesterol. A
random sample of 7 adults is used to measure total blood cholesterol, in
milligrams per deciliter of blood), before eating the new cereal and after
eating the new cereal for one year. Here are the data (in order by subject):

Total blood cholesterol before eating the new cereal:


before <- c(210, 225, 240, 250, 255, 270, 235)
Total blood cholesterol after eating the new cereal:
after <- c(200, 220, 245, 248, 252, 268, 232)

1) What is Ho, Ha, and what kind of test is this?


Problem 2
A dietician claims eating a new cereal will lower total blood cholesterol. A
random sample of 7 adults is used to measure total blood cholesterol, in
milligrams per deciliter of blood), before eating the new cereal and after
eating the new cereal for one year. Here are the data (in order by subject):

Total blood cholesterol before eating the new cereal:


before <- c(210, 225, 240, 250, 255, 270, 235)
Total blood cholesterol after eating the new cereal:
after <- c(200, 220, 245, 248, 252, 268, 232)

2) At a = 0.1, what is the statistical decision?


Problem 2
A dietician claims eating a new cereal will lower total blood cholesterol. A
random sample of 7 adults is used to measure total blood cholesterol, in
milligrams per deciliter of blood), before eating the new cereal and after
eating the new cereal for one year. Here are the data (in order by subject):

Total blood cholesterol before eating the new cereal:


before <- c(210, 225, 240, 250, 255, 270, 235)
Total blood cholesterol after eating the new cereal:
after <- c(200, 220, 245, 248, 252, 268, 232)

3) What is the interpretation?


Problem 2
A dietician claims eating a new cereal will lower total blood cholesterol. A
random sample of 7 adults is used to measure total blood cholesterol, in
milligrams per deciliter of blood), before eating the new cereal and after
eating the new cereal for one year. Here are the data (in order by subject):

Total blood cholesterol before eating the new cereal:


before <- c(210, 225, 240, 250, 255, 270, 235)
Total blood cholesterol after eating the new cereal:
after <- c(200, 220, 245, 248, 252, 268, 232)

If we considered post – pre, what is Ho, Ha, and what kind of test is this?
How would the output from R/RStudio change?

You might also like