Reference Paper 15th August

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

User name: Lab___-r__- PC___

Rules to follow:
1. Logon with user name lab__- r- __PC- __. And logon identity CBML. (No
password required)
2. Create a folder with your name and ID# on Desktop. Import data file in
this folder. Save SPSS data file and output both, separately for each
question.
After completing and saving all your work select your folder as to
be copied. Go to my computer. Select and open exam drive and Save your
folder here. This is your responsibility to save your data on exam drive.
3. Please answer all questions with proper question number. Save a
separate output file for each question or properly rename the output
file with the question number. All answers must be given in proper
order

Question1 8 marks
A factory that emits airborne pollutants is testing two different brands of filters
for its smokestacks. The factory has two smokestacks. One brand of filter (Filter
I) is placed on one smokestack, and the other brand (Filter II) is placed on the
second smokestack. Random samples of air released from the smokestacks are
taken at different times throughout the day. Pollutant concentrations are measured
from both stacks at the same time. The data represent the pollutant concentrations
(in parts per million) for samples taken at 20 different times after passing through
the filters.
a) Perform the normality test to check if the differences in concentration
levels at all times are normally distributed.
b) Make a 95% confidence interval for the mean of the population of paired
differences, where a paired difference is equal to the pollutant
concentration passing through Filter I minus the pollutant concentration
passing through Filter II.
a) Using a 5% significance level, can you conclude that the average paired
difference for concentration levels is different from zero?

Step 1: Perform Normality Test for Paired Differences

1. Open SPSS and load the data from the provided dataset.
2. Go to the "Analyze" menu, select "Descriptive Statistics," and then choose
"Explore."
3. Move the variables representing concentrations through Filter I and Filter II to
the "Dependent List" box.

1|Page
4. Click the "Plots" button and select "Normality plots with tests" under
"Descriptive."
5. Click "Continue" and then click "OK" to run the analysis.

SPSS will provide you with normality test results, including histograms and statistical
tests (e.g., Shapiro-Wilk) for both concentration levels through Filter I and Filter II.

Step 2: Calculate Paired Differences and Confidence Interval

1. Calculate the paired differences for each observation: Paired Difference =


Concentration through Filter I - Concentration through Filter II.
2. Open a new data file and create a new variable to store the paired differences.
3. Calculate the paired differences and save them in the new variable.

Step 3: Calculate Confidence Interval

1. Go to the "Analyze" menu, select "Descriptive Statistics," and then choose


"Explore."
2. Move the variable representing the paired differences to the "Dependent List"
box.
3. Click the "Statistics" button and select "Descriptives" and "95% Confidence
Interval for Mean."
4. Click "Continue" and then click "OK" to run the analysis.

SPSS will provide you with the mean of the paired differences and the 95% confidence
interval for the mean.

Step 4: Perform Hypothesis Test

1. To test whether the average paired difference for concentration levels is


different from zero, you can perform a one-sample t-test on the paired
differences:
 Go to the "Analyze" menu, select "Compare Means," and then choose
"One-Sample T Test."
 Move the variable representing the paired differences to the "Test
Variable(s)" box.
 In the "Test Value" field, enter "0" (since you want to test if the mean
difference is different from zero).
 Click "OK" to run the analysis.

SPSS will provide you with the results of the one-sample t-test, including the t-value,
degrees of freedom, and p-value.

Step 5: Interpretation

Examine the p-value provided in the output of the one-sample t-test. If the p-value is
less than 0.05 (5% significance level), you can reject the null hypothesis and conclude
that the average paired difference for concentration levels is different from zero.

2|Page
Question 2 8marks
In a Harris poll conducted by
Harris Interactive between April
10 and April 17, 2012, U.S.
adults aged 18 years and older
were asked whether they agreed
with the statement, “In general,
people on Wall Street are as
honest
and moral as other people.”
The accompanying chart shows
the percentage distribution
of the responses of these adults.
Twenty-eight percent of the adults polled said that they agree with this statement,
68% disagreed, and 4% were not sure or refused to answer. Assume that these
percentages were true for the population of U.S. adults in 2012. Perform the
required hypothesis test to check if the hypothesis whether these percentages with
respect to the foregoing statement are still true for the current distribution of 2000
people who participated in this study.

3|Page
Category

Frequency Percent Valid Percent Cumulative


Percent

Agree 667 33.4 33.4 33.4

Disagree 686 34.3 34.3 67.7


Valid
Not Sure /Refused 647 32.4 32.4 100.0

Total 2000 100.0 100.0

4|Page
Ho: The current percentage distribution of public opinion is same as reported in
some past study
Ha: The current percentage distribution of public opinion is different from what is
reported in some past study

5|Page
6|Page
Respondent's opinion

Observed N Expected N Residual

agree 667 560.0 107.0


disagree 686 1360.0 -674.0
Not Sure /Refused 647 80.0 567.0
Total 2000

7|Page
Test Statistics

Respondent's
opinion

Chi-Square 4373.084a
df 2
Asymp. Sig. .000

a. 0 cells (0.0%) have expected


frequencies less than 5. The
minimum expected cell
frequency is 80.0.

P-value < level of significance


0.000 < 0.05 Reject null hypothesis
At the 5% significance level data do provide evidence to conclude that the current
percentage distribution of public opinion is different from what is reported in
some past study.

Question3 8 marks
Let us consider the case of households in California and Wisconsin who belong
to various income groups. Suppose these households are further divided in three
income strata: high-income group (with an income of more than $200,000),
medium-income group (with an income of $70,000 to $200,000), and low-income
group (with an income of less than $70,000). Perform the required hypothesis test
at 5% significance level that income groups are dependent on household’s
residential state.
Step 1: Formulate income groups. To create groups, find min and max first

8|Page
9|Page
10 | P a g e
Statistics
Income

Valid 401
N
Missing 0
Minimum 40814
Maximum 494218

Step 2 : summarize your data using the cross tab command. This will help you
create the data file to be used in running Chi square test for independence.

11 | P a g e
residential state * income groups Crosstabulation
Count

income groups Total

$70,000 or less $70,000 to $200,000 or


$200,000 more

California 16 55 119 190


residential state
Wisconsin 20 55 136 211
Total 36 110 255 401

12 | P a g e
Ho: Residential state and income groups are not related
Ha: Residential state and income groups are related

13 | P a g e
14 | P a g e
Residential state * income groups Crosstabulation

income groups Total

$70,000 or less $70,000 to $200,000 or


$200,000 more

Count 16 55 119 190


California
Expected Count 17.1 52.1 120.8 190.0
Residential state
Count 20 55 136 211
Wisconsin
Expected Count 18.9 57.9 134.2 211.0
Count 36 110 255 401
Total
Expected Count 36.0 110.0 255.0 401.0

Chi-Square Tests

Value df Asymp. Sig. (2-


sided)
a
Pearson Chi-Square .479 2 .787
Likelihood Ratio .479 2 .787
Linear-by-Linear Association .014 1 .907
N of Valid Cases 401

a. 0 cells (0.0%) have expected count less than 5. The minimum


expected count is 17.06.

P-value > level of significance


0.787 >0.05 Do not reject null hypothesis
At the 5% significance level data do not provide evidence to conclude that residential state and
income groups are associated.

Question 4 10 marks
A university employment office wants to compare the time taken by graduates
with three different majors to find their first fulltime job after graduation. The
data lists the time (in days) taken to find their first full-time job after graduation
for a random sample of 29 business majors, 24 computer science majors, and 21
engineering majors who graduated in May 2014.
a) Perform the normality test to check if the time variable is normally distributed
for three fields.
b) At a 5% significance level, can you reject the null hypothesis that the variances
of time taken to find their first full-time job for all May 2014 graduates in these
fields are different?
c) At a 5% significance level, can you reject the null hypothesis that the mean
time taken to find their first full-time job for all May 2014 graduates in these
fields are different ?
15 | P a g e
d) Perform the Tucky Multiple Comparison test (if required) to check in which
field there is actually a difference in time exist.

Step 1: Perform Normality Test

1. Open SPSS and load the data from the provided dataset.
2. Go to the "Analyze" menu, select "Descriptive Statistics," and then choose
"Explore."
3. Move the variable representing the time taken to find their first full-time job to
the "Dependent List" box.
4. Move the variable representing the major field (business, computer science,
engineering) to the "Factor List" box.
5. Click the "Plots" button and select "Normality plots with tests" under
"Descriptive."
6. Click "Continue" and then click "OK" to run the analysis.

SPSS will provide you with normality test results, including histograms and statistical
tests (e.g., Shapiro-Wilk) for each major field.

Step 2: Perform Levene's Test for Equality of Variances

1. To test if the variances of time taken are different for the three major fields,
perform Levene's test for equality of variances:
 Go to the "Analyze" menu, select "Compare Means," and then choose
"One-Way ANOVA."
 Move the variable representing the time taken to the "Dependent List"
box.
 Move the variable representing the major field to the "Factor" box.
 Click the "Options" button and select "Homogeneity of variances test"
under "Post Hoc."
 Click "Continue" and then click "OK" to run the analysis.

SPSS will provide you with the results of Levene's test, including the test statistic,
degrees of freedom, and p-value.

Step 3: Perform One-Way ANOVA

1. Go to the "Analyze" menu, select "Compare Means," and then choose "One-
Way ANOVA."
2. Move the variable representing the time taken to the "Dependent List" box.
3. Move the variable representing the major field to the "Factor" box.
4. Click "OK" to run the analysis.

SPSS will provide you with the results of the one-way ANOVA, including the F-value
and p-value for testing the equality of means across major fields.

Step 4: Perform Tukey's Multiple Comparison Test

16 | P a g e
If the one-way ANOVA indicates a significant difference in means, you can perform
Tukey's multiple comparison test to determine which pairs of major fields have
significantly different mean times:

1. Click the "Post Hoc" button in the one-way ANOVA results window.
2. Select "Tukey" as the method for performing multiple comparisons.
3. Click "Continue" and then click "OK" to run the test.

SPSS will provide you with the results of Tukey's multiple comparison test, showing
which major fields have significantly different mean times.

Step 5: Interpretation

 For normality test: Examine the p-values provided in the output. If p-values are
greater than 0.05 (5% significance level), you can assume that the data for each
major field is normally distributed.
 For Levene's test: Examine the p-value provided in the output. If the p-value is
greater than 0.05, you can assume that the variances of time taken are not
significantly different across major fields.
 For one-way ANOVA: Examine the p-value provided in the output. If the p-value
is less than 0.05, you can reject the null hypothesis and conclude that there is a
significant difference in mean times across major fields.
 For Tukey's multiple comparison: Examine the results to identify which pairs of
major fields have significantly different mean times.

Question 5 6 marks
Health experts recommend that runners drink 4 ounces of water every 15 minutes
they run. Although handheld bottles work well for many types of runs, all-day
cross-country runs require hip-mounted or over-the-shoulder hydration systems.
In addition to carrying more water, hip-mounted or over-the-shoulder hydration
systems offer more storage space for food and extra clothing. As the capacity
increases, however, the weight and cost of these larger-capacity systems also
increase. The data show the weight (ounces) and the price for 26 hip- mounted or
over-the-shoulder hydration systems (Trail Runner Gear Guide, 2003).
a. Use these data to develop an estimated regression equation that could be used
to predict the price of a hydration system given its weight. Did the estimated
regression equation provide a good fit? Explain.

b. At the .05 level of significance, perform the required hypothesis test to check if
the there is a linear association between weight and price.

c. Also perform the appropriate test to confirm the significance of the relationship
at the .05 level of significance.

17 | P a g e
18 | P a g e
b. At the .05 level of significance, perform
the required hypothesis test to check if the
there is a linear association between
weight and price.

Correlations

Weight Price

Pearson Correlation 1 .898**

Weight Sig. (2-tailed) .000

N 26 26
**
Pearson Correlation .898 1

Price Sig. (2-tailed) .000

N 26 26

**. Correlation is significant at the 0.01 level (2-tailed).


Ho: ρ = 0
Ha: ρ ≠ 0

P-value < level of significance


0.000 < 0.05 At the 5% significance level data do provide evidence to conclude that there is a
linear association between weight and price of hydration system.

a. Use these data to develop an estimated regression equation that could be used
to predict the price of a hydration system given its weight. Did the estimated
regression equation provide a good fit? Explain.

19 | P a g e
Model Summary

Model R R Square Adjusted R Std. Error of the


Square Estimate
a
1 .898 .807 .799 8.457

a. Predictors: (Constant), Weight

80.7% change in price variable of hydration system is due to the variable weight
of hydration system, remaining 19.3 %change is due to other factors

20 | P a g e
c. Also perform the appropriate test to confirm the significance of the relationship
at the .05 level of significance.

ANOVAa

Model Sum of Squares df Mean Square F Sig.

Regression 7167.874 1 7167.874 100.216 .000b

1 Residual 1716.587 24 71.524

Total 8884.462 25

a. Dependent Variable: Price


b. Predictors: (Constant), Weight

Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.


Coefficients

B Std. Error Beta

(Constant) 4.979 3.380 1.473 .154


1
Weight 2.937 .293 .898 10.011 .000

a. Dependent Variable: Price

Ho: β =0
Ha: β ≠0
P-value < level of significance
0.000 < 0.05 Reject null hypothesis
At the 5% significance level data do provide evidence to conclude that weight of
hydration system is a significant determinant of price.

Price = f (weight)
P= 4.979 + 2.937 W
With one unit increase weight ( in ounce) , the price of hydration system
increases by 2.937 units.

21 | P a g e

You might also like