Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Statistics II Project

Subject of project: Comparison of means


Matched samples
Prepared by: Idi Xhengo, Klaiv Domi

Binf II-C
Abstract

This project consists of mean inferences. It shows in a detailed way how to perform statistical
tests and jump to conclusions using the hypothesis method of comparing two means of
different populations or the same populations. Based on sample data we are able to conduct
the test and move on to conclusions based on the evidence collected as a result, yet not fully
certain about it. Whenever the data presented to us tells us that the population means are
unknown, we base our hypothesis testing on the t-distribution.

This project has four parts. The first part is the abstract itself, which gives a description of the
whole project and what it consists of. The second part is an introduction about cases of mean
inferences; hypothesis testing conducted on them and also interval estimation of the two
means described with the use of some important concepts and formulas. The third part is the
presentation of the problem in question, its analysis, solution both by the regular hypothesis
testing steps and Excel. The last part is just a conclusion to state the observation results and to
sum up the project.

Regarding the problem we chose as an example, it has to do with a two-tailed test about two
matched samples when the respective populations are unknown. We perform the testing
procedures based on a t-distribution, since we do not know the population means and standard
deviations, thus we rely on the given samples to estimate them.

Introduction to Inferences between two population means.


The comparison of two independent population means is fairly common, and it allows us to 
assess whether the two groups are different. 
Is the night shift less productive than the day shift, are fixed asset investment rates of return di
fferent from common stock investment rates, is a particular program more efficient than
another and so on…
The observed difference between two sample means is determined by the sample standard 
deviations as well as the sample means. 
If there is a lot of diversity among the individual samples, very different means can happen by
chance. 
This will have to be incorporated into the test statistic. 
The t.test compares two independent population means with unknown and perhaps unequal 
population standard deviations. 
We'll use the degrees of freedom formula.
Then check if it is possible that these two samples are drawn from the same population. After
that it is possible to construct a new random variable to solve this problem. 
We understand that we have two sample means, one from each set of data, resulting in two 
random variables originating from two unknown distributions. 
We establish a new random variable, the difference between the sample means, to solve the 
problem. 
This new random variable has a distribution as well, and the Central Limit Theorem tells us that, 
independent of the underlying distributions of the original data, this new distribution is normall
y distributed. 
This concept is also explained graphically:

Inferences about two population Means (σ 1 and σ 2 known)


When we compare two population means from independent populations, the interest is in the
difference of the two means. In other words, if μ1 is the population mean from population 1
and μ2 is the population mean from population 2, then the difference is μ1−μ2. If μ1−μ2=0 then
there is no difference between the two population parameters.
If each population is normal, then the sampling distribution of x¯i is normal with mean μi,
standard error σ, and the estimated standard error S1.
Using the Central Limit Theorem, if the population is not normal, then with a large sample, the
sampling distribution is approximately normal.
The theorem presented in this Lesson says that if either of the above are true, then x¯1−x¯2 is

approximately normal with mean μ1−μ2, and standard error  .


The null and alternative hypothesis are assumed as below:
After the hypothesis are ready, we have to conduct the test-statistics which will provide us
information on whether to reject or not the null hypothesis.

However, in most cases, σ1 and σ2 are unknown, and they have to be estimated. It seems


natural to estimate σ1 by s1 and σ2 by s2. When the sample sizes are small, the estimates may
not be that accurate and one may get a better estimate for the common standard deviation by
pooling the data from both populations if the standard deviations for the two populations are
not that different.

Matching  Pairs—Inferences  about  Two  Means  with  Dependent  Samples

When there is a relationship between the samples, they are referred to as dependent samples. 
The information is made up of matched pairs drawn from random samples. 
When the values chosen for one sample are used to determine the values in the second sample
, the sampling method is said to be dependent. 
Dependent samples are measurements taken before and after on a population. The objects of
the sample are measured twice: once at one point in time, and then again at a subsequent 
point in time. 
Dependency can also emerge when objects are connected.

We use the difference of the pairs of data in our analysis. For each pair, we subtract the
values:

 d1 = value ofvar1 – value of var2

We are creating a new random variable d (differences), and it is important to keep the sign,
whether positive or negative. We can compute d̄, the sample mean of the differences, and sd,
the sample standard deviation of the differences as follows:
We’ll use the same three pairs of null and alternative hypotheses

The critical value comes from the student’s t-distribution table with n – 1 degrees of
freedom, where n = number of matched pairs. The test statistic follows the student’s t-
distribution.

Inferences about Two Means with Independent Samples (Assuming Unequal


Variances)

Using independent samples means that there is no relationship between the groups. The values
in one sample have no association with the values in the other sample. 

With a two-sample t-test, we compare the population means to each other and again look at

the difference. We expect that   would be close to μ1 – μ2. The test statistic will use
both sample means, sample standard deviations, and sample sizes for the test.

 For a one-sample t-test we used  as a measure of the standard deviation (the


standard error).
 We can rewrite  → .

 The numerator of the test statistic will be 

 This has a standard deviation of  .

A two-sample t-test follows the steps as below:

 Write the null and alternative hypotheses.


 State the level of significance and find the critical value. The critical value, from the
student’s t-distribution, has the lesser of n1-1 and n2 -1 degrees of freedom.
 Compute the test statistic.
 Compare the test statistic to the critical value and state a conclusion.

Both samples come from independent random samples. The populations must be normally
distributed, or both have large enough sample sizes (n1 and n2 ≥ 30). We will also use the same
three pairs of null and alternative hypotheses.

The test statistic is Welch’s approximation under the assumption that the independent
population variances are not equal.
This test statistic follows the student’s t-distribution with the degrees of freedom formula as
below:

When handling a problem long-hand, a simpler option to finding degrees of freedom is to use t
he lesser of 
n1-1 or n2-1 as the degrees of freedom. 
This strategy yields a lesser degree of freedom value and, as a result, a greater critical value. 
This makes the test more conservative, as rejecting the null hypothesis requires more evidence.

Pooled Two-sampled t-test (Assuming Equal Variances)

Making the assumption that our two populations have unequal variations. 
The Welch's t-test statistic does not presume that the population variances are equal and can b
e used whether or not they are. 
The pooled t-test is a statistical test that assumes equal population variances. 
Finding a weighted average of the two independent sample variances is referred to as pooling.

The pooled test statistic uses a weighted average of the two sample variances.

The advantage of this test statistic is that it exactly follows the student’s t-distribution with n1+
n2– 2 degrees of freedom.
On the basis of sample data, it may be difficult to establish that two population variances are 
equal. 
The F-test is a popular way to test variances, although it isn't very reliable. 
Small deviations from normalcy have a significant impact on the outcome, making the F-test res
ults untrustworthy. 
It can be difficult to tell if a significant result from an F-test is due to non-normality or difference
s in variances. 
As a result, when comparing two means, many researchers employ Welch's t.

P value approach
There is another way of deciding on whether to reject or not the null hypothesis. After the
test statistic is performed, we calculate the p-value, which shows the % of obtaining a value
as small as or as big as the sample. In other words, it shows the incompatibility of data
samples to the null hypothesis data variability. If the p-value is lower or equal to the level of
significance, we reject the null hypothesis.

Confidence Interval about the Difference of Two Independent Means

A hypothesis test will answer the question about the difference of the means. BUT, we can
answer the same question by constructing a confidence interval about the difference of the
means. This process is just like the confidence intervals from Chapter 2.
1.Find the critical value.
2.Compute the margin of error.
3.Point estimate ± margin of error.
Because we are working with two samples, we must modify the components of the confidence
interval to incorporate the information from the two populations.

Population Standard deviations unknown:

The point estimate is  .

The standard error comes from the test statistic .


We will use the same three steps to construct a confidence interval about the difference of the
means.

Critical value 

E = 

The confidence interval takes the form of the point estimate plus or minus the standard error of
the differences.

± 
OR

±E
Standard deviations of the populations are known we use the z-statistics to compute the
interval estimation.

Matched samples formula

Interval estimation interpretation


Based on the confidence interval we can say that, we are α % confident that the
difference between mean values of different populations is between the boundaries of
these interval. Suppose α=5% means that Out of 100 samples, in 95 of them we will find
the mean difference between populations to be within this interval.

Problem 25, pg. 415, from chapter 10.3

Problem Analysis
As we can see, we have the case of matched samples, and the population size, mean, and
standard deviation are unknown. The key to solving this problem is the sample difference
mean, which the whole problem is based around. All the calculations have been done based on
approximate values, since the t-distribution tables provided do not have exact p-values. Still we
use the t-distribution table to find a range for the p-values. Computer software aids us to
determine the exact p-value for correct results.

Step 1. Develop the null and alternative hypotheses

H o : μd = 0
H a : μd ≠ 0

Step 2. Specify the level of significance

α = 0.05

Step 3. Compute the value of the test statistic

D̄ = -1.2 sd = 1.971 n=15 μd = 0

t = (D̄ - μd)/(sd / √n) = -1.2/(1.971/ √15) = -1.2/0.51 = -2.35

Critical value approach

Step 4. Determine the critical value and the rejection rule

For α = 0.05 and df =14, the critical value is t0.025 = 2.145.


We reject H0 if t ≥ 2.145 or t ≤ -2.145.

Step 5. Determine whether to reject Ho

Since -2.35 ≤ -2.145, we reject H0.

P-value approach

Step 4. Compute the p-value


p-value = 0.034

Step 5. Determine whether to reject Ho

Since the p-value is smaller than α, we reject H0.

Excel spreadsheet

1. At first we add all our data into uniform rows.


2. 2.Second calculate the difference between each data of each individual using
excel functions.
3. Go to Option and add Analysis Tool Pack.
4. Go to Data Menu and Click Data Analysis.
5. From there choose t-test Paired Two Sample for Means (Matched Samples).
6. Into variable 1 section we select all rows of the first column(television).
7. Into variable 2 section we select all rows of the second column(Radio).
8. Then check for the level of significance (0.05).
9. In the output range we select a random cell in Excel spreadsheet
10. At the end click ok.
11. Use the values in order to draw conclusions.
Conclusions
Matched samples design frequently results in a lower sampling error
because of the independent sample design.
As a result, variance between sampled products is eliminated.
The test conducted for these case is a t-test, so it is necessary to use the t-distribution table.
Also the test is two-tailed regarding the null hypothesis formulation and the data which may be
positive or negative apart the mean. Two rejection regions are picked.
We have also calculated the p-value which in the end shows the probability of data occurring to
be incompatible with the results that approve the null hypothesis.
In the end we can conclude that we are 95% confident that there is a difference in between the
mean usage and as a result in the usage between cable television and radio.

References
Essentials of Statistics for Business and Economics.
Lesson slides
courses.lumenlearning.com

You might also like