Assignment1 6

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

Conestoga College - HIM Program

Introduction to Data Analysis HIM72010


Assignment #1: Exploring Excel, Data Characterization, Table & Chart
Descriptions
Due: October 5th, 2015
Total Marks: 62

Introduction:
Welcome to your new job. Data Analyst at our Clinic for Health (CFH).
Your first assignment is to use Excel to generate descriptive statistics, tables,
charts, and to delve deeper into sampling and sampling distributions. In some cases
you may have to explore some aspects of Excel on your own. There are many
tutorials on the internet which can help. You will work individually on this
assignment. However, if you get stuck ask a friend for guidance, not the solution.
Please submit the Excel document with your work along with this word
document (filled in). Where you can, please copy the tables, graphs, and charts
into the word document.
Before you begin, be sure to have added the Analysis toolPak to excel. Choose
File/Options/Add-ins/Analysis ToolPak > Ok. I suggest using excel 2013 for this
course, it is available free on the Hub.

Part 1: Frequency Distributions


Before you do any analytical work, it is always a good idea to visualize the data you
are working with. Frequency distribution tables, histograms, and bar-charts are good
for getting a better understanding of the variables in your data. Much of the
assignment is doing that.
Visits 2012 and 2013
In the provided spreadsheet Assignment 1Template.xlsx, there is a sheet called
Visits 2012 and 2013. In this sheet we have all the patients that are registered with
the CFH clinic and the number of visits they made to the clinic each year for the
years 2012 and 2013. Your first task is to describe this data set.
Q1. (2 marks) Describe the nature of the variables Visits in 2012 and Visits in
2013. Are these variables discrete or continuous and what is their scale of
measurement (NOIR)?
Q2. (5 marks) Create two frequency distribution tables (one for 2012 and one for
2013) with the frequency, relative frequency, cumulative frequency, and cumulative
relative frequency. Use the following bins (class intervals) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 10+.

Q3. (2 marks) What proportion of our patients had 5 or less visits in year 2012?
What about in 2013? What proportion had between 2 and 3 visits in year 2012?
What about 2013?
Q4. (2 marks) What two columns from the frequency distribution table (you created
above) would be best to compare 2012 to 2013? Why?
Q5. (5 marks) Create an appropriate Histogram or bar chart for 2013 data. I have
included an example.

Patient visits in 2013

# of patients at CFH

12

1200%

10

1000%

800%

600%

400%

200%

0%
# of visits

Q6. (2 marks) Does the number of visits follow a normal distribution? If so why? If
not, why?
Why do we care? Well, lets suppose you are interested in finding factors that
determine the number of visits to the clinic. Our outcome variable would be number
of visits.the type of analysis we choose will depend on the frequency distribution
of our outcome.
BMI
One of the doctors in the clinic is running a weight-loss program for couples. So far
he has recruited thirty couples. He is interested in evaluating the impact the
program has on both women and men. Their body mass index has been collected at
the onset of the study and is included in the BMI sheet.

Q1. (2 marks) Describe the nature of the variables female BMI and male BMI. Are
these variables discrete or continuous and what is their scale of measurement
(NOIR)?
Q2. (5 marks) Create two frequency distribution tables (one for males and one for
females) with the frequency, relative frequency, cumulative frequency, and
cumulative relative frequency. Use appropriate bins (class intervals).
Q5. (5 marks) Create an appropriate Histogram or bar chart for females and the
males.
Q6. (1 mark) Do the female BMI and male BMI scores follow a normal distribution?
Q7. (2 marks) Couple #2 have a rivalry going. The BMI of the male is slightly higher
than his companion. However she argues that relative to other females she is in
better shape than he is relative to other males. Can you sort this out? Using a zscore, which of the partners has a higher BMI score when compared to their
respective populations (male female)? Show your work.

Part 2. Sampling
In the patient demographics worksheet, all 27,752 patients from the clinic are listed
along with their age.
Data/Data Analysis/Sampling Choose the Age column for the input range, and
Random sampling, with 20 observations in Column F. Do the same thing for a
sample of 300 observations in Column G
(10 marks) Fill in the table to describe the data.
Parameter*/Statist
ic
Mean
Mode
Min
1st quartile
Median (2nd
quartile)
3rd quartile
Max
Range
IQR
Standard
deviation

All patients
of the clinic
*

Random
sample of 20
patients

Random
sample of 300
patients

(2 marks) Which random sample do you think is a better representation of the clinic
population and why?

Part 3: Graphing
1. (3 marks)
Manually create a back-to-back Stem-and-Leaf display from the first 20
couples in the BMI sheet.
2. (3 marks)
There is a sheet called mental health concerns. In this worksheet, Use excel
to categorize the presenting mental health concerns into 7 or 8 categories.
(Dont do this manually!) You can create the categories into groups you think
make sense. Create a pie chart to show the relative frequency of patient
visits by mental health category.

Part 4: Contingency Table using Pivot Table in Excel


Consider the data (included in Angel worksheet in the Excel document) showing a
number of characteristics associated with 30 infants:
The Apgar Scale is a measure of the well-being of new-born infants. Its scores
can vary between 0 and 10 with low scores being bad. Mothers Parity refers to
the number of previous live births.
(3 marks) Use excel to produce a grouped frequency table for the
Birthweight (g) using class intervals: 2700-2999, 3000-3299, , 42004499.
3. (5 marks) Using excel, produce a cross-tabulation (pivot-table) of the
variables Mother smoked during pregnancy? (Y/N) and Apgar score
<7? (Y/N) for the 30 newborn infants.
4. (3 marks) Produce the same cross-tabulation as the previous question but
with values expressed as percentages of the column.

You might also like