Professional Documents
Culture Documents
Group Mid-Term Exam
Group Mid-Term Exam
Hà Nội, 9/2023
Question 1: Consider the variable income in gss.say file (the variable is total family
income in the year before the survey).
1. Make a frequency table for the variable. Does the frequency table make sense? Does it
make sense to make a histogram of the variable? A bar chart?
2. What is the scale of measurement for the variable?
3. What descriptive statistics are appropriate for describing this variable and why? Does it
make sense to compute a mean?
4. Discuss the advantages and disadvantages of recording income in this manner.
Describe other ways of recording income and the problem associated with each of
them.
Question 2: In the gss.sav file, the variable tvhours tells you how many hours per day
GSS respondents say they watch TV.
1. Make a frequency table of the hours of television watched. Do any of the values strike
you as strange? Explain.
Upon reviewing the frequency table, a particular value that stands out as unusual is 12
hours of television viewing. The frequency table shows a notable pattern: a relatively
high number of respondents watch TV for up to 10 hours daily, but this number sharply
decreases beyond 10 hours. However, the frequency of individuals watching precisely
12 hours per day is unexpectedly high within the range of 11 to 24 hours. This deviates
from the established pattern and warrants further investigation.
2. Based on the frequency table, answer the following questions: Of the people who
answered the question, what percentage don't watch any television? What percentage
watch two hours or less? Five hours or more? Of the people who watch TV, what
percentage watch one hour? What percentage watch four hours or less?
Based on the frequency table, of the people who answered this question:
• The percentage of people who do not watch any television is 6%
• The percentage of people who watch 2 hours or less is 53.1%
• The percentage of people who watch 5 hours or more is:
100% - (6% + 20.9% + 26.3% + 17.5% + 12.7%) = 16.6%
Based on the frequency table, of the people who watch TV (the values of variable 0
is eliminated):
• The percentage of people who watch one hour is 100% = 22.2%
• The percentage of people who watch 4 hours or less is 100% = 82.3%
3. From the frequency table, estimate the 25th, 50th, 75th, 95th percentiles. What is the
value for the Median, Mode?
We will calculate the 25th, 50th, 75th, and 95th percentiles for the variable tvhours
with the Frequencies option, which generates percentiles, median, and mode (as shown
in the SPSS data view above)
4. Make a bar chart of the hours of TV watched. What problem do you see with this
display?
Here is a bar chart of the hours of TV watched:
There are a few problems with this bar chart:
• A few outliers exist with low frequencies but are used in a large number of discrete
values.
• Some values are missing (9, 13, 16, 17, 18, 19, 21, 22, 23) because these values do
not appear in the survey response.
• Because there are low frequencies given for higher classes, the true shape of the
distribution is difficult to understand
5. Make a histogram of the hours of TV watched. What causes all of the values to be
clumped together? Compare this histogram to the bar chart you generated in question
2d. Which is a better display for these data?
Here is a histogram of the hours of TV watched:
All of the values in the histogram are clumped together because this graph
above is skewed right, which means that most values are distributed to the left of the
dataset and the right tail is longer.
In this case, the histogram might show the values being clumped together due to the way
the data is categorized. The categories are discrete and have relatively wide intervals (e.g.,
1-2, 3-4, ...), which can make the histogram bars appear clumped instead of showing a
smooth distribution.
Comparing this histogram to a bar chart generated from the same data, the histogram may
be a better display for this data because it's designed for displaying the distribution of
continuous or discrete data, especially when you're interested in how the data is
distributed within specified ranges (like hours of TV watched). The histogram will
provide a clearer representation of the distribution within these ranges compared to a
simple bar chart.
Question 3: Find a data set which is related to a specific organisational problem either at
the macro or micre level) and apply all possible descriptive statistical techniques that you
think suitable to the problem. Write a short report, which includes the objectives of your
analysis, the research questions, the analytical techniques you apply to address to the
research questions and your findings. The maximum length of the report is 5 pages
including Tables and Figures.