Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Lab 6 Joey Martinez

Math 311
06/01/16
In the following lab we will be exploring the concepts of matched pairs t and two sample t
inference methods. The initial part of this lab employs the use of the matched pairs t test and
confidence interval in the analysis of disparate salaries between men and women with similar GPAs and
college majors. Our main inquiry is whether there exists a significant difference in salaries among men
and women who have graduated from college. In the second part of our lab we move on to the
examination of calorie consumption in 4th and 5th graders based on lunch period. The data in this second
part of the lab qualified for two sample t inference methods, and so, our form of investigation will follow
this method. The goal of this section will be to determine whether recess before or after lunch has any
influence on the number of calories consumed by children in these grades. We will explore this question
by constructing a 95% confidence interval estimating the mean population difference and conducting
hypothesis testing to aid us in our investigation.

Part I)

This section of our lab contains a sample of 10 pairs of college graduate starting salaries, where
each male and female in the pair was assigned based on similar major and college GPA. A historic
characteristic of the labor market has been the wage gap between men and women of most professions,
which still persists today. We wish to investigate whether attending college has had any success in
reducing the wage gap between genders. An initial analysis of the given data reveals some obvious
differences in starting salaries between the groups of men and women, as can be seen from a side by
side box plot of the data and some simple basic descriptive statistics.

Boxplot of Male, Female Salaries


70000

60000
Salariy in Dollars

50000

40000

30000

Male Female
Lab 6 Joey Martinez
Math 311
06/01/16
Descriptive Statistics: Male, Female

Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum


Male 10 0 43930 3689 11665 29300 38250 40800 47425 69500
Female 10 0 43370 3694 11680 28800 38150 39150 46575 69200

Our initial examination of the data shows a clear difference in median salaries between men and
women. This means that more than half of women’s starting salaries fall below the median of $40,800
for men. We also note that the inner quartile range for women is shifted slightly downward relative to
the sample of men’s salaries. Both samples have means that are fairly similar to each other, with
x̅1= $43,930 for men and x̅2=$43,370 for women. The mean of both samples is above their respective
median salaries, which indicates a right skew in their distributions; most likely due to an extreme outlier
present in both groups of data. We also note that the distributions of salaries for both men and women
are extremely similar, as indicated by the following graphs:

Histogram of Female Salaries Histogram of Male Salaries

6 6

5 5

4 4
Frequency

Frequency

3 3

2 2

1 1

0 0
30000 40000 50000 60000 70000 30000 40000 50000 60000 70000
Salary in Dollars Salary in Dollars

Suppose our data comes from an approximately normal population. Then, given that the
distributions of the salaries of both men and women are nearly identical, the matched-pairs t test
applies to our samples, as desired.

We now consider a matched-pairs t test to test the claim that women college graduates have a
lower starting salary than men, given similar qualifications and experience. We first estimate the mean
of the differences between starting salaries of men and women by constructing a 95% confidence
interval as follows.

Paired T CI: Male, Female


Paired T for Male - Female

N Mean StDev SE Mean


Male 10 43930 11665 3689
Female 10 43370 11680 3694
Difference 10 560 833 263
95% CI for mean difference: (-36, 1156)
Lab 6 Joey Martinez
Math 311
06/01/16
Thus we can say, with 95% confidence, that the mean difference in salaries between men and
women will fall within the interval (-36, 1156), after repeated sampling. We note that the possible
mean of the differences as 0 is contained inside the above interval and that the range of the interval is
fairly small.
We now consider significance testing of the claim from above. We begin by setting our level of
significance at α=.05 Let the following null and alternative hypotheses be given; H0:µd=0 and Ha: µd>0,
where µd is the mean of the differences in salaries for men and women. By Minitab, we have our results
as:

Paired T-Test: Male, Female


Paired T for Male - Female

N Mean StDev SE Mean


Male 10 43930 11665 3689
Female 10 43370 11680 3694
Difference 10 560 833 263

T-Test of mean difference = 0 (vs > 0): T-Value = 2.13 P-Value = 0.031

Using our statistical software, we receive a p-value of p=0.031<.05 and, so, we reject the null
hypothesis. Hence we have moderate evidence to support the claim that the mean starting salary for
women, with similar backgrounds and experience, is lower than that of men’s.

Part II)
This next section of our lab explores the influence of recess time placement before or after
lunch has on calorie consumption, which we indicate with a 1 for before and 2 for after. Data on 4th and
5th graders was collected from four schools, for 10 days each, and recorded the plate waste from each
child on a given day; this plate waste was then used to compute the nutrients that each child consumed
for the day. The sample sizes for placement of recess before and after lunch were n1=1209 and n2=1514,
respectively. Note that the data from each sample has no influence on the other and can be considered
independent. Suppose we wish to test whether placement of recess with respect to lunch affects the
number of calories consumed by student. Before we begin, we first conduct an analysis of our given
data by recess period and determine the type of test appropriate for our investigation. Consider the
following simple descriptive statistics and histograms of the data.

Descriptive Statistics: Calories


Recess
1=before
lunch
Variable 2=after N N* Mean SE Mean StDev Minimum Q1 Median
Calories 1 1209 0 492.88 4.63 161.15 -192.69 385.54 509.31
2 1514 0 429.77 3.74 145.43 -197.11 326.82 439.70
Recess
1=before
Lunch
Variable 2=after Q3 Maximum
Calories 1 610.86 856.62
2 536.40 783.81
Lab 6 Joey Martinez
Math 311
06/01/16
Histogram of Calories
-140 0 140 280 420 560 700 840
1 2
90

80

70

60
Frequency

50

40

30

20

10

0
-140 0 140 280 420 560 700 840
Calories
Panel variable: Recess 1=before lunch 2=after

The most apparent feature of the sample data is their similarity in shape, which is fairly normal.
There are left skews present in both sets of sample data, given that the sample means fall to the left of
their respective medians; however, the variability in the distributions is small, with sample standard
deviations of s1=161.15, s2=145.43 and 50% of the data falling densely between I1= (385.54, 610.86) and
I2= (326.82, 536.54), respectively. The normality of both sets of data is most likely due to the size of the
samples collected, as indicated by the Central Limit Theorem. Thus, since the distributions of both sets
of data are fairly normal, of similar shape, and are independent of each other, we will use two-sample t
inference methods for the remainder of this section.

We begin by testing claim of whether placement of recess with respect to lunch has an effect on
the number of calories consumed by children during lunch. Let the level of significance be given by
α=.05 and suppose we have the following null and alternative hypotheses; H0:µ1- µ2=0 and Ha: µ1- µ2>0,
where µ1 is the population mean calorie consumption for recess before lunch and µ2 is the population
mean calorie consumption for after lunch. By Minitab, we have the following results:

Two-Sample T-Test and CI: Calories, Recess 1=before lunch 2=after


Two-sample T for Calories

Recess
1=before
lunch
2=after N Mean StDev SE Mean
1 1209 493 161 4.6
2 1514 430 145 3.7

Difference = μ (1) - μ (2)


Estimate for difference: 63.11
95% CI for difference: (51.44, 74.79)
T-Test of difference = 0 (vs ≠): T-Value = 10.60 P-Value = 0.000 DF = 2459
Lab 6 Joey Martinez
Math 311
06/01/16
Therefore, since our p-value is p≈0, we reject the null hypothesis and have strong evidence to
support the claim that the placement of recess either before or after lunch has an impact on the number
of calories consumed by children during lunch. The significance of our result can be demonstrated by
the following graph of our calculated t-value and its relative position with respect to the rejection region
critical values.

We now wish to construct a 95% confidence interval, estimating the mean difference in
populations for before and after lunch. Fortunately, this interval was calculated when conducting our
hypothesis testing and is given by (51.44, 74.79). We note that the estimated mean difference in calorie
consumption is fairly small, with the amount of calories contributing to the difference as comparable to
eating an extra apple, egg, or oatmeal cookie. Although the difference is not large, under our
investigation, it still would satisfy that a difference exists in calories consumed with respect to recess
placement.

Bonus Question)

Suppose we are now curious wither other variables have an affect on calorie consumption. We
now wish to determine whether there is a difference in the number of calories consumed based on
gender. Let the level of significance be set at α=.05 and let the null and alternative hypotheses be given
by H0:µ1- µ2=0 and Ha: µ1- µ2>0, where µ1 is the population mean calorie consumption for boys and µ2 is
the population mean calorie consumption for girls. Then we have the following results:
Lab 6 Joey Martinez
Math 311
06/01/16
Two-Sample T-Test and CI: Calories, Gender

Two-sample T for Calories

Gender N Mean StDev SE Mean


F 1465 441 155 4.0
M 1258 477 155 4.4

Difference = μ (F) - μ (M)


Estimate for difference: -36.35
95% CI for difference: (-48.01, -24.69)
T-Test of difference = 0 (vs ≠): T-Value = -6.11 P-Value = 0.000 DF = 2659

We note that the negative values represent a deficit in calories consumed by females, relative to
male calorie consumption. Thus, given that p≈0, we reject the null hypothesis and have strong evidence
to support the claim that gender has an impact on the number of calories consumed in children and a
difference does exist. We further demonstrate our results by displaying our t-test values with respect to
our rejection region critical values.

You might also like