Download as pdf or txt
Download as pdf or txt
You are on page 1of 2

Stat 311 Homework 2

This assignment has some problems related to Lesson2 and emphasizes exploratory data analysis (EDA)—
visualization and numeric summaries for qualitative and quantitative data. We recommend that you create a
new folder for this assignment. Download the data files and HW2Template.Rmd to this folder before you
begin (Be sure to rename your Template file). Check out the two .Rmd files that appear on the Lesson 2
Presentations page—they contain code examples for several types of summaries that were presented in the
lectures. While we provide much of the code you need in the template, you can refer to the files
CategoricalData.Rmd or QuantitativeData.Rmd as needed. You may simply copy/paste/edit code you might
want to use. Upload your final pdf file to Gradescope [do not forget to identify the page numbers for each
part of each problem according to the outline].
Problems 1 and 2 do not require any code. Simply type your answers into the .Rmd file. Problems 3 and 4
require data files and the use of R code.
To reinforce the concepts in the Lesson 2 lectures and for extra practice with R commands, I recommend that
you try some of the OpenIntro tutorials that I linked on the Readings page for Lesson 2, but this is totally
optional.

Problems
1. The following table, from Categorical Data Analysis by Alan Agresti (1990) and reprinted from
Gallagher et al. (1987), shows the counts from a sample of psychiatrists that have been classified by their
school of psychiatric thought and their opinions on the origin of schizophrenia. Answer a) – d) with “by-
hand” calculations, showing your work.

School of Psychiatric Origin of Schizophrenia


Thought Biogenic Environmental Combination
Eclectic 90 12 78
Medical 13 1 6
Psychoanalytic 19 13 15
a) What percentage of these psychiatrists identify with the medical school of psychiatric thought?
b) What percentage of these psychiatrists believe that the origin of schizophrenia is environmental?
c) What percentage of these psychiatrists identify with the eclectic school of psychiatric thought and
believe that the origin of schizophrenia is biogenic?
d) Of the psychiatrists who identify with the psychoanalytic school of psychiatric thought, what
percentage believe that the origin of schizophrenia is a combination of biogenic and environmental
factors?
2. Data from a new social media site that lets users add friends indicate that 50% of users have 150 or more
friends and that the average friend count of users is 75. What do these data suggest about the shape of the
distribution of the number of friends of users of this new social media site? Explain.

1
Stat 311 Homework 2

3. Complete a) – d) using the psychiatric data from Problem 3. The data are in Psychiatrists.csv and variable
definitions are in PsychiatristsDataDictionary.pdf. In the HW2 template we have provided code to read in
and set up the data.
a) How many observations are in the data set?
b) In the HW2 template, we provide code to produce a two-way contingency table with school of
psychiatric thought in the rows and the origin of schizophrenia in the columns. Use the information in
the contingency table, along with R as a calculator, to calculate the marginal percentages for the
psychiatric school of thought? Explain to a layperson what is meant by these marginal percentages.
[Hint: you will be reporting three percentages].
c) In the HW2 template, we have provided code that produces a new table that shows row conditional
percentages instead of counts. What are the conditional percentages for the origin of schizophrenia
for psychiatrists who identify with the eclectic school of psychiatric thought? What is meant, in
layperson terms, when we refer to these conditional percentages? [Okay to use R as a calculator]
d) In the HW2 template, we have provided code that produces one bar graph to explore the association
between school of psychiatric thought and the origin of schizophrenia. Using this bar graph, do the
variables school of psychiatric thought and the origin of schizophrenia appear to be associated
(dependent) or do the variables appear to be independent. Explain. [Note: this is a qualitative answer
based only on data visualization]
4. Complete parts a) – g) using a data set about test scores for students from various backgrounds. Data are
in StudentPerformance.csv and variable definitions are in StudentPerformanceDataDictionary.pdf. In the
HW2 template we have provided code to read in and set up the data.
a) We do not have any details about how these data were collected, but given the variables, do you think
these data came from an observational study or an experiment? Briefly explain.
b) Are there missing values for any of the variables? If so, specify which variables and how many
values are missing. [Hint: use summary(object), where object is the name of the object that
contains the full data set; R identifies missing values using NA]
c) Explore the writing score variable through summary statistics. Report the 7-number summary
statistics. Based on these statistics, which metrics do you think are best to use to summarize the
location (center) and variability (spread) of writing scores? Briefly explain.
d) In the HW2 template, we provide code that creates a histogram and boxplot of the writing scores.
Complete the code by adding proper axis labels. Synthesize your overall findings to describe the
sample distribution of writing scores by combining the information you got from the summary
statistics in part (c) and the graphs. Be sure to comment on mode(s), skew, outliers, etc.
e) Using R as a calculator, calculate the z-scores for the minimum and maximum writing scores.
Interpret these two z-scores in context.
f) Calculate the IQR for the writing scores. Interpret this value in context.
g) In the HW2 template, we have provided code to plot comparative boxplots of writing scores by
free/reduced lunch status (free/reduced or standard). Do writing scores appear to vary by whether a
student qualifies for free/reduced lunches? Explain. [Code includes a line to remove the observations
with missing writing scores]

You might also like