Bacs HW4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

HW4

112078516

2024-03-15

Question 1)

The following problem’s data is real, but the scenario is strictly imaginary. The large American phone
company Verizon had a monopoly on phone services in many areas of the US. The New York Public Utilities
Commission (PUC) regularly monitors repair times with customers in New York to verify the quality of
Verizon’s services. The file verizon.csv has a recent sample of repair times collected by the PUC.

verizon <- read.csv("verizon.csv")


sample_size <- length(verizon$Time)
sample_mean <- mean(verizon$Time)
sample_sd <- sd(verizon$Time)
se <- (sample_sd /sqrt(sample_size))

a. Imagine that Verizon claims that they take 7.6 minutes to repair phone services for its
customers on average. The PUC seeks to verify this claim at 99% confidence (i.e., significance
� = 1%) using traditional statistical methods.

i. Visualize the distribution of Verizon’s repair times, marking the mean with a vertical line
Plot the density function of Verizon’s repair times

plot(density(verizon$Time), col="blue", lwd=2,


main = "Verizon's repair times")
# Add vertical lines showing mean
abline(v=mean(verizon$Time), col="orange")

1
Verizon's repair times
0.12
0.08
Density

0.04
0.00

0 50 100 150 200

N = 1687 Bandwidth = 1.003

ii. Given what the PUC wishes to test, how would you write the hypothesis? (not graded)

The average minutes for Verizon to repair phone services for its customers is 7.6.
Hnull (𝜇 = 7.6 hrs)
Halt The average minutes for Verizon to repair phone services for its customers is not 7.6.
(𝜇 ≠ 7.6 hrs)

# 99% CI
c(sample_mean-2.58*se,sample_mean+2.58*se)

iii. Estimate the population mean, and the 99% confidence interval (CI) of this estimate.

## [1] 7.593073 9.450946

# t value
t <- (sample_mean - 7.6) / se
t

iv. Find the t-statistic and p-value of the test

2
## [1] 2.560762

# degree of freedom
df <- sample_size - 1
p <- 1 - pt(t, df)
p

## [1] 0.005265342

The value of t-statistic is 2.5607623.


The p-value is 0.0052653.

v. Briefly describe how these values relate to the Null distribution of t (not graded)
When df increase, the shape of t distribution becomes more similar to a normal distribution.
Theoretically, the t-distribution only becomes perfectly normal when the sample size reaches the population
size.

vi. What is your conclusion about the company’s claim from this t-statistic, and why?
Because this is a two-tailed test, the threshold for single side is 0.05%. The p-value 0.0052653 is bigger
than 0.5%, so we can not reject H0 at 99% confidence.
We can also do t-test for checking.

t.test(verizon$Time,mu=7.6,alternative="two.sided",conf.level = 0.99)

##
## One Sample t-test
##
## data: verizon$Time
## t = 2.5608, df = 1686, p-value = 0.01053
## alternative hypothesis: true mean is not equal to 7.6
## 99 percent confidence interval:
## 7.593524 9.450495
## sample estimates:
## mean of x
## 8.522009

The overall p-value = 0.01053 > 0.01, so we can not reject H0 at 99% confidence.

b. Let’s re-examine Verizon’s claim that they take no more than 7.6 minutes on average, but
this time using bootstrapped testing:

i. Bootstrapped Percentile: Estimate the bootstrapped 99% CI of the population mean

set.seed(23)
compute_sample_mean <- function(sample0) {
resample <- sample(sample0, length(sample0), replace=TRUE)
mean(resample)
}
bootmean <- replicate(2000,compute_sample_mean(verizon$Time))
mean(bootmean)

3
## [1] 8.521639

quantile(bootmean, probs=c(0.005, 0.995))

## 0.5% 99.5%
## 7.663299 9.437621

The bootstrapped mean is 8.521639, and the 99% CI is 7.663299, 9.437621.

ii. Bootstrapped Difference of Means: What is the 99% CI of the bootstrapped difference
between the sample mean and the hypothesized mean?

set.seed(23)
bootsmeandiff <- replicate(2000, compute_sample_mean(verizon$Time-7.6))
mean(bootsmeandiff)

## [1] 0.9216395

quantile(bootsmeandiff, probs=c(0.005, 0.995))

## 0.5% 99.5%
## 0.06329926 1.83762072

The bootstrapped mean is 0.9216395, and the 99% CI is 0.06329926, 1.83762072.

iii. Bootstrapped t-statistic: What is the 99% CI of the bootstrapped t-statistic of the sample
mean versus the hypothesized mean?

set.seed(23)
# Function to compute bootstrapped t-statistic
compute_t_statistic <- function(sample_arg, hypothesized_mean){
reexamine_sample <- sample(sample_arg, length(sample_arg), replace = TRUE)
t_statistic <- (mean(reexamine_sample) - hypothesized_mean) /
(sd(reexamine_sample) / sqrt(length(reexamine_sample)))
t_statistic
}

# Generate 2000 bootstrapped t-statistics based on the original dataset


bootstrap_t_stats <- replicate(2000, compute_t_statistic(verizon$Time, 7.6))

# Compute the mean of bootstrapped t-statistics


# (should be close to 0 under null hypothesis)
mean(bootstrap_t_stats)

## [1] 2.531093

4
# Calculate the 99% confidence interval for bootstrapped t-statistics
quantile(bootstrap_t_stats, probs = c(0.005, 0.995))

## 0.5% 99.5%
## 0.2151776 4.6382164

The 99% CI of the bootstrapped t-statistic is 2.531093.


The 99% confidence interval (CI) of the bootstrapped t-statistic of the sample mean versus the hypothesized
mean is calculated to be between -0.06574763 and 4.61355749.

iv. Plot the distribution of the three bootstraps above on separate plots; draw vertical lines
showing the lower/upper bounds of their respective 99% confidence intervals.

Plot three distributions in density plot with lower/upper bounds of their respective 99% confidence intervals.

plot(density(bootmean),main = "bootstrapped mean")


abline(v=quantile(bootmean, probs=c(0.005, 0.995)), lty="dashed")

bootstrapped mean
1.0
0.8
Density

0.6
0.4
0.2
0.0

7.0 7.5 8.0 8.5 9.0 9.5 10.0

N = 2000 Bandwidth = 0.06954

plot(density(bootsmeandiff), main = "bootstraped difference of means")


abline(v=quantile(bootsmeandiff, probs = c(0.005, 0.995)), lty="dashed")

5
bootstraped difference of means
1.0
0.8
Density

0.6
0.4
0.2
0.0

−0.5 0.0 0.5 1.0 1.5 2.0 2.5

N = 2000 Bandwidth = 0.06954

plot(density(bootstrap_t_stats), main = "bootstraped t-statistic of means")


abline(v=quantile(bootstrap_t_stats, probs = c(0.005, 0.995)), lty="dashed")

6
bootstraped t−statistic of means
0.4
0.3
Density

0.2
0.1
0.0

0 2 4 6

N = 2000 Bandwidth = 0.1671

v. Does the bootstrapped approach agree with the traditional t-test in part [a]?

set.seed(23)
bootmean <- replicate(2000,compute_sample_mean(verizon$Time))
quantile(bootmean, probs=c(0.005, 0.995))

## 0.5% 99.5%
## 7.663299 9.437621

bootmeandiff <- replicate(2000,compute_sample_mean(verizon$Time-7.6))


quantile(bootmeandiff, probs=c(0.005, 0.995))

## 0.5% 99.5%
## 0.04212762 1.92928278

Both the values of 7.6 and 0 fall outside the 99% confidence interval (CI) in the bootstrapped approach.
Put simply, even though the traditional t-test didn’t dismiss the null hypothesis (𝐻0 ) with 99% confidence
in part (a), the bootstrapped approach in part (b) also upheld the H0 because both 7.6 and 0 fell within
the 99% confidence intervals. However, it’s important to note that when using a high confidence level like
99%, the results can vary between different statistical methods. This variability is a characteristic of the
bootstrapped approach, especially when dealing with critical points like the 99% confidence level. Therefore,
it’s possible to encounter instances where the results from bootstrapping differ from those obtained using
traditional statistical tests like the t-test, as demonstrated by changing the random seed.

7
c. Finally, imagine that Verizon notes that the distribution of repair times is highly skewed by
outliers, and feel that testing the mean is not fair because the mean is sensitive to outliers.
They argue that the median is a more fair test, and claim that the median repair time is no
more than 3.5 minutes at 99% confidence (i.e., significance � = 1%).

i. Bootstrapped Percentile: Estimate the bootstrapped 99% CI of the population median

set.seed(12)
compute_sample_median <- function(sample0) {
resample <- sample(sample0, length(sample0), replace=TRUE)
median(resample)
}
bootmedian <- replicate(2000,compute_sample_median(verizon$Time))
mean(bootmedian)

## [1] 3.615185

quantile(bootmedian, probs=c(0.005, 0.995))

## 0.5% 99.5%
## 3.15 3.95

ii. Bootstrapped Difference of Medians:

What is the 99% CI of the bootstrapped difference between the sample median and the
hypothesized median?

set.seed(12)
bootmediandiff <- replicate(2000,compute_sample_median(verizon$Time-3.5))
mean(bootmediandiff)

## [1] 0.115185

quantile(bootmediandiff, probs=c(0.005, 0.995))

## 0.5% 99.5%
## -0.35 0.45

iii. Plot distribution the two bootstraps above on two separate plots (note that we are not
using a test statistic for the median).

plot(density(bootmedian),main = "bootstrapped median")


abline(v=quantile(bootmedian, probs=c(0.005, 0.995)), lty="dashed")

8
bootstrapped median
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Density

3.0 3.2 3.4 3.6 3.8 4.0

N = 2000 Bandwidth = 0.02791

plot(density(bootmediandiff),main = "bootstrapped median")


abline(v=quantile(bootmediandiff, probs=c(0.005, 0.995)), lty="dashed")

9
bootstrapped median
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Density

−0.4 −0.2 0.0 0.2 0.4 0.6

N = 2000 Bandwidth = 0.02791

iv. What is your conclusion about Verizon’s claim about the median, and why?

Verizon’s claim about the median is 3.5, we fail to reject the H0 hypothesis at a 99% confidence level.

Question 2) Load the compstatslib package and run interactive_t_test(). You


will see a simulation of null and alternative distributions of the t-statistic, along
with significance, power, and can activate an error matrix.

If you do not see interactive controls (slider bars), press the gears icon (�) on the top-left of
the visualization.

library(compstatslib)
# interactive_t_test()

Recall the form of t-tests, where 𝑡 = (𝑥−𝜇√ 0)


𝑠/ 𝑛
Let’s see how hypothesis tests are are affected by various factors:
diff: the difference we wish to test (𝑥 − 𝜇0 )
sd: the standard deviation of our sample data (𝑠)
n: the number of cases in our sample data
alpha: the significance level of our test (e.g., alpha is 5% for a 95% confidence level)
Your colleague, a data analyst in your organization, is working on a hypothesis test where he has sampled
product usage information from customers who are using a new smartwatch. He wishes to test whether the

10
mean (𝑥𝑖 ) usage time is higher than the usage time of the company’s previous smartwatch released two years
ago (𝜇0 ):
Hnull : The mean usage time of the new smartwatch is the same or less than for the previous smartwatch.
Halt : The mean usage time is greater than that of our previous smartwatch.
After collecting data from just n=50 customers, he informs you that he has found diff=0.3 and sd=2.9.

Your colleague believes that we cannot reject the null hypothesis at alpha of 5%.

Use the slider bars of the simulation to the values your colleague found and confirm from
the visualization that we cannot reject the null hypothesis. Consider the scenarios (a – d)
independently using the simulation tool. For each scenario, start with the initial parameters
above, then adjust them to answer the following questions:

i. Would this scenario create systematic or random error (or both or neither)?

ii. Which part of the t-statistic or significance (diff, sd, n, alpha) would be affected?

iii. Will it increase or decrease our power to reject the null hypothesis?

iv. Which kind of error (Type I or Type II) becomes more likely because of this scenario?

a. You discover that your colleague wanted to target the general population of Taiwanese
users of the product. However, he only collected data from a pool of young consumers, and
missed many older customers who you suspect might use the product much less every day.

i.
A: This scenario would create systematic errors because the sample does not reflect the entire population
of Taiwanese product users.
ii.
A: If we include data from older customers in our sample, we anticipate that the diff(difference) in outcomes
will likely decrease. This expectation arises from the assumption that older customers might use the product
less frequently each day. Additionally, we expect the sd(standard deviation) to increase when including
data from older customers. The parameter n(sample size) would be affected as the biased sample lacks
representation from older users.
iii.
A: Decreasing the sample’s diversity would decrease the power to reject the null hypothesis.
iv.
A: In this scenario, Power is 1 - 𝛽, power decrease implies 𝛽,the kind of error that becomes more likely is
a Type II error, because the biased sample may not capture differences in usage time between age groups
accurately. Type I error remains unchanged in this scenario.

# interactive_t_test()
# n = 50, diff = 0.3, sd = 2.9, alpha = 0.05
knitr::include_graphics("a.png")

11
b. You find that 20 of the respondents are reporting data from the wrong wearable device,
so they should be removed from the data. These 20 people are just like the others in every
other respect.

i.
A: random error
ii.
A: The part of the t-statistic affected would be n(sample size) due to the there are only 30 people remaining
after removing 20 individuals.
iii.
A: The removal of respondents decreases the sample size (n), leading to decreased power to reject the null
hypothesis.
iv.
A: In this scenario, Power is 1 - 𝛽, power decrease implies 𝛽 resulting from the decreased sample size,
this kind of error that becomes more likely is a Type II error. Type I error remains unchanged in this scenario.

# interactive_t_test()
# n = 30, diff = 0.3, sd = 2.9, alpha = 0.05
knitr::include_graphics("b.png")

12
c. A very annoying professor visiting your company has criticized your colleague’s “95% con-
fidence” criteria, and has suggested relaxing it to just 90%.

i.
A: neither
ii.
A: 𝛼 increase, because 𝛼 change from 5% to 10%
iii.
A: It will increase our power to reject the null hypothesis.
iv.
A: Increasing the probability of a Type I error 𝛼 leads to a higher Type I errors. When the power of a test
increases, the probability of a Type II error decreases.

# n = 30, diff = 0.3, sd = 2.9, alpha = 0.1


# interactive_t_test()
knitr::include_graphics("c.png")

13
d. Your colleague has measured usage times on five weekdays and taken a daily average. But
you feel this will underreport usage for younger people who are very active on weekends,
whereas it over-reports usage of older users.

i.
A: Both random error and systematic error.
ii.
A: We lack sufficient information to predict the direction of change (increase or decrease) for diff and sd.
Both n and 𝛼 remain constant.
iii.
A: It depends on the changes of the df (difference), because the measurement method changes.
iv.
A: Type I error remains unchanged, while Type II error may either increase or decrease depending on the
result of the change in diff.

14

You might also like