ALY6010 - Project 5 Document - US Occupations

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Project 5

Two-sample Confidence Intervals & Hypothesis Testing


Overview and Rationale
This assignment is designed to provide you with hands-on experiences in estimating and
hypothesizing with two population parameters of interest. The data set is provided in an
Excel workbook and contains a wide range to data types that you will need to work with.

Course Outcomes

This assignment is directly linked to the following key learning outcomes from the course
syllabus:

CO1: Explore the use of statistical software in data analysis through hands-on applications,

CO4: Perform estimations of population parameters using confidence intervals based on


one sample and perform estimations of the difference between two population parameters
of the same kind based on two samples.

CO6: Perform various hypothesis tests, including those for a population parameter (single
sample), and the difference between two population parameters of the same kind (two
samples), and perform analysis of variance (ANOVA).

CO7: Interpret meaningful relationships and patterns in the data in relation to a given
business question

Assignment Summary

Using the data provided in the attached Excel workbook provided, apply the methods of
confidence intervals and hypothesis testing to estimate and test a hypothesis regarding the
three population parameters:

• a population mean,
• a population proportion, and
• a population variance (or standard deviation).

Follow the instructions in this project document to analyze the data presented in the Excel
workbook. Then complete a report summarizing the results in your Excel workbook (or R
script file). Submit both the report and the Excel workbook (or R script file).
Project Description

Using the Data worksheet found in the Module 5 Project_ Occupational Data.xlsx Excel
workbook, complete the following analyses’ regarding New York City and Los Angeles
occupational data. Place the results in the worksheet specific in each part of the
assignment.

In some parts of this project, you are asked to create random samples from a given
population. Random sampling methods have been covered in Module 3, and tutorials are
available in the Instructor Perspective folder in your Blackboard course page.

Part 1

1. Use the random sampling method explained in the Instructor Perspective of Module 3
to generate a random sample of size 38 from the NY LOC QUOTIENTs and a random
sample of size 32 from the LA LOC QUOTIENTs.
2. Copy the two samples in the designated columns of worksheet Q1 and Label them as
sample 1 and sample 2 respectively.
3. Then complete Table A in the Q1 worksheet.

Part 2

Copy the samples obtained in Part 1 into the designated cells in the Q2 worksheet and
complete the following analysis of the data.

1. Complete Table A in the Q2 worksheet.


2. Use the two sample to create a 90%, a 95% and a 98% confidence interval for the
population mean in the designated cells in Table B of the Q2 worksheet. Assume that
the two population variances are known (Use the values calculated for the two
population variances in Table A of worksheet Q1).

In your report, explain your solution procedures and your finding. .

Part 3

1. Use the random sampling method explained in the Instructor Perspective of Module 3
to randomly draw a sample of size 22 from the NY LOC QUOTIENTs and a random
sample of size 20 from the LA LOC QUOTIENTs. Then complete Table A in the Q3
worksheet.
2. Use the sample to create 90%, 95% and 98% confidence intervals for the difference of
the two population means in the designated cells in Table B of the Q3 worksheet.
Assume that the two population variances are unequal.
In your report, explain your solution procedures and your finding. Explain whether the
confidence intervals created in step 2 indicate that there exists a difference between the
two averages of the LOC QUOTIENT values in the two metropolitan areas. Interpret the
significance of such a difference in the context of the given data.

Part 4

1. Use the random sampling method explained in the Instructor Perspective of Module 3
to randomly draw a sample of size 150 from the NY LOC QUOTIENTs and a random
sample of size 130 from the LA LOC QUOTIENTs.
2. Calculate the proportions of the LOC QUOTIENTs in each of your samples that are
greater than 2, then complete Table A and Table B in the Q4worksheet.
3. Use the two samples above to construct a 90%, a 95%, and 99% confidence intervals for
the difference of the two population proportions of LOC QUOTIENTs that are greater
than 2 in Table C in the Q4worksheet.
4. Complete Table D in the Q4worksheet.
5. Complete Table E in the Q4worksheet.

In your report, explain your solution procedures and your findings in the word document
and state whether the confidence intervals created in step 3 indicate that there exists a
difference between the two population proportions.

Part 5

1. Use the two samples created in Q3 to complete Table A in the Q5 worksheet.


2. In the designated cells in Q5, use alpha = 0.05 to test the hypothesis that the standard
deviations of the two populations are different.
3. Use alpha = 0.05 to test the hypothesis that the standard deviations of the two
populations are different by using an appropriate test in the Data Analysis TOOLPAK of
Excel. Place the output of the Data Analysis TOOLPAK under Table B in the Q5
worksheet.

In your report, explain your solution procedures and your finding. In particular, interpret
the P-value of the test in the context of this hypothesis testing problem.

Part 6

1. Use your two samples of Q1 to test the hypothesis that the mean of the LOC QUOTIENTs
of NY is less than that of LA. Use alpha = 0.10 and assume that the two population
variances are known (Use the values calculated for the two population variances in
Table A of worksheet Q1). Place the results in the designated cells in Q6.
2. Repeat the hypothesis testing procedure of step 1 above by performing an appropriate
test of the Data Analysis ToolPak of Excel. In worksheet Q6, select the area under Table
C to display the output of the Data Analysis ToolPak.

In your report, explain your hypothesis testing methods, procedures, and findings.

Part 7

1. Use your two samples of Q3 to test the hypothesis that the mean of the LOC QUOTIENTs
of NY is less than that of LA. Use alpha = 0.10 and assume that the two population
variances are unequal. Place the results in the designated cells in Q7.
2. Repeat the hypothesis testing procedure of step 1 above by performing an appropriate
test of the Data Analysis ToolPak of Excel. In worksheet Q7, select the area under Table
B to display the output of the Data Analysis ToolPak.
3. Use your two samples of Q3 to test the hypothesis that the mean of the LOC QUOTIENTs
of NY is less than that of LA. Use alpha = 0.10 and assume that the two population
variances are equal.
4. Repeat the hypothesis testing procedure of step 3 above by performing an appropriate
test of the Data Analysis ToolPak of Excel. In worksheet Q7, select the area under Table
D to display the output of the Data Analysis ToolPak.

In your report, explain your hypothesis testing methods, procedures, and findings.

Part 8

1. Consider the information given in columns D, E, and F, of worksheet Q8, to be that of a


population. Use the random sampling method explained in Module 3 to randomly select
20 dependent pairs of LOC QUOTIENT values from NY and LA corresponding to the
same OCC_TITLE.
2. Complete column M and Table A in the worksheet.
3. Use your two samples of step 1 to test the hypothesis that the mean of NY LOC
QUOTIENTs is different than that of LA. Use alpha = 0.05
4. Repeat the hypothesis testing procedure of step 3 above by performing an appropriate
test of the Data Analysis ToolPak of Excel. In worksheet Q8, select the area under Table
B to display the output of the Data Analysis ToolPak.

In your report, explain your hypothesis testing methods, procedures, and findings.

Part 9

Consider:
𝒑𝒑𝒑𝒑 = 𝑇𝑇ℎ𝑒𝑒 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑜𝑜𝑜𝑜 𝑁𝑁𝑁𝑁 𝐿𝐿𝐿𝐿𝐿𝐿 𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑎𝑎𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 2, and 𝒑𝒑𝒑𝒑 = 𝑇𝑇ℎ𝑒𝑒
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 𝑜𝑜𝑜𝑜 𝐿𝐿𝐿𝐿 𝐿𝐿𝐿𝐿𝐿𝐿 𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄𝑄 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 𝑎𝑎𝑎𝑎𝑎𝑎 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑡𝑡ℎ𝑎𝑎𝑎𝑎 2 ,

Perform a hypothesis testing procedure with alpha = 0.05 to test the claim that 𝒑𝒑𝒑𝒑 > 𝒑𝒑𝒑𝒑.
(Assume no prior knowledge of 𝒑𝒑𝒑𝒑 and 𝒑𝒑𝒑𝒑).

Complete all designated cells in worksheet Q9.

In your report, describe your results and explain whether the results obtained from your
hypothesis testing are consistent with those obtained about confidence intervals completed
in Q4.

Format & Guidelines

The submission of this project consists of an Excel workbook (or an R script file) and a
Word– a minimum of two files that should be submitted as attachments.

Complete your analytic work in an Excel workbook (or in an R script file).

The report should include all your findings along with important statistical issues. The
report should follow the following format:

(i) Introduction
(ii) Analysis
(iii) Conclusion

And be 800-900 words and be presented in the APA forma


Rubric

Category Above Standard Meets Standards Approaching


Below Standards Not Evident
Standards
Excel (or R): Thoroughly and Accurately modeled the Satisfactorily modeled Partially modeled the Did not submit or
Problem concisely modeled the problem in Excel (or R) the problem in Excel (or problem in Excel (or R) incompletely modeled
Modeling & problem in Excel (or R) for each method R) for each method. for each method, but the problem in Excel (or
Set-up for each method there are some gaps in R)
ALY6010_CO4 the problem modeling
ALY6010_CO6 and setup

Thoroughly and Thoroughly obtained Satisfactorily obtained Partially obtained Did not submit or did
Excel (or R): efficiently obtained accurate solutions in correct solutions in accurate solutions in not obtain accurate
Problem correct and accurate Excel (or R) by using the Excel (or R) by using the Excel (or R) by using the solutions in Excel (or R)
Solution & solutions in Excel (or R) appropriate analytic appropriate analytic appropriate analytic using the appropriate
Accuracy by using the appropriate tools of the software tools of the software tools of the software analytic tools of the
ALY6010_CO4 analytic tools of the software
ALY6010_CO6 software

Thoroughly provided a Thoroughly provided a Satisfactorily provided a Partially provided a Did not submit or did
Word/Report: summary of the problem summary of the problem summary of the problem summary of the problem not provide a summary
Problem descriptions and descriptions and descriptions and descriptions and problem of the problem
Description & introduced the problem problem introduction problem introduction introduction descriptions and
Introduction using rich and significant problem introduction
ALY6010_CO1 ideas

Thoroughly and Accurately described the Satisfactorily described Partially described the Did not submit or did
Word/Report:
accurately described the analytic concepts and the analytic concepts analytic concepts and not provide a summary
Description of
Problem analytic concepts and theories used in and theories used in theories used in of the problem
Analysis theories used in analyzing the problem analyzing the problem analyzing the problem descriptions and
ALY6010_CO7 analyzing the problem problem introduction
Category Above Standard Meets Standards Approaching
Below Standards Not Evident
Standards
Thoroughly described Thoroughly described Satisfactorily described Partially described the Did not submit or did
the conclusions and the conclusions and the conclusions and conclusions and results not describe the
Word/Report:
Description of results obtained in the results obtained in the results obtained in the obtained in the project conclusions and results
Conclusions project using a high level project project obtained in the project
ALY6010_CO7 of critical thinking and
reasoning

Completely free of There are no noticeable There are very few There are more than five Did not submit; or there
errors in grammar, errors in grammar, errors in grammar, errors in grammar, are many errors in
spelling, and spelling, and spelling, and spelling, and grammar, spelling, and
Word/Report:
punctuation; and punctuation; and punctuation; and punctuation; or the punctuation; or the
Writing
Mechanics, completely correct usage completely correct usage completely correct usage usage of title page, usage of title page,
Title Page, & of title page, citations, of title page, citations, of title page, citations, citations, and references citations, and references
References and references. The and references. The and references. The are incomplete; or the are totally incomplete;
report contains a report contains a report contains a report contains less than or the report contains
minimum of 1000 words minimum of 1000 words minimum of 1000 words 1000 words very few words

You might also like