Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

TRƯỜNG ĐẠI HỌC KINH TẾ QUỐC DÂN

KHOA QUẢN TRỊ KINH DOANH


--------------

GROUP MID-TERM EXAM


Subject: Business Statistics

Group 6: Đặng Diệu Hương


Hoàng Khánh Hương
Nguyễn Phương Anh
Tăng Hà Anh
Tạ Ngọc Mai
Trịnh Trung Hiếu
Đoàn Thị Quỳnh Anh
Class: EMQI 63

Hà Nội, 9/2023
Question 1: Consider the variable income in gss.say file (the variable is total family
income in the year before the survey).
1. Make a frequency table for the variable. Does the frequency table make sense? Does it
make sense to make a histogram of the variable? A bar chart?
2. What is the scale of measurement for the variable?
3. What descriptive statistics are appropriate for describing this variable and why? Does it
make sense to compute a mean?
4. Discuss the advantages and disadvantages of recording income in this manner.
Describe other ways of recording income and the problem associated with each of
them.

Question 2: In the gss.sav file, the variable tvhours tells you how many hours per day
GSS respondents say they watch TV.
1. Make a frequency table of the hours of television watched. Do any of the values strike
you as strange? Explain.
Upon reviewing the frequency table, a particular value that stands out as unusual is 12
hours of television viewing. The frequency table shows a notable pattern: a relatively
high number of respondents watch TV for up to 10 hours daily, but this number sharply
decreases beyond 10 hours. However, the frequency of individuals watching precisely
12 hours per day is unexpectedly high within the range of 11 to 24 hours. This deviates
from the established pattern and warrants further investigation.
2. Based on the frequency table, answer the following questions: Of the people who
answered the question, what percentage don't watch any television? What percentage
watch two hours or less? Five hours or more? Of the people who watch TV, what
percentage watch one hour? What percentage watch four hours or less?
Based on the frequency table, of the people who answered this question:
• The percentage of people who do not watch any television is 6%
• The percentage of people who watch 2 hours or less is 53.1%
• The percentage of people who watch 5 hours or more is:
100% - (6% + 20.9% + 26.3% + 17.5% + 12.7%) = 16.6%
Based on the frequency table, of the people who watch TV (the values of variable 0
is eliminated):
• The percentage of people who watch one hour is 100% = 22.2%
• The percentage of people who watch 4 hours or less is 100% = 82.3%

3. From the frequency table, estimate the 25th, 50th, 75th, 95th percentiles. What is the
value for the Median, Mode?
We will calculate the 25th, 50th, 75th, and 95th percentiles for the variable tvhours
with the Frequencies option, which generates percentiles, median, and mode (as shown
in the SPSS data view above)

From given table, we find below percentile value:


• 25th Percentile :
from column of cumulate percent 25th lies in 26.8 percent. (26.8 percent is for 1
hour per day watching TV)
25th percentile = 1
• 50th percentile:
from Column of cumulative percent. 50th percentile lies in 53.1 percent
→ 53.1 percent is for 2 hours per day watching Tv.
50th percentile = 2
• 75th percentile
from Columns of Cumulative percent, 75th percentile Lies in
83.3 percent
→83.3 percent is for 4 hours per day watching TV.
75th percentile = 4
• 95th percentile
from Column of Cumulative percent 95th percentile Lies in.
96.1 percent.
→ 96.1 percent is for 8 hours per day watching TV.
95th percentile = 8

• Median is always equal to 50th percentile.


Median = 2
• Mode is value which occurs maximum times.
2 hours has 238 frequency, which is maximum among
all frequency.
Mode=2
Histogram is showing Hours per day watching TV and its frequency and this graph
indicates that distribution is not normal but it is right skewed.

4. Make a bar chart of the hours of TV watched. What problem do you see with this
display?
Here is a bar chart of the hours of TV watched:
There are a few problems with this bar chart:
• A few outliers exist with low frequencies but are used in a large number of discrete
values.
• Some values are missing (9, 13, 16, 17, 18, 19, 21, 22, 23) because these values do
not appear in the survey response.
• Because there are low frequencies given for higher classes, the true shape of the
distribution is difficult to understand

5. Make a histogram of the hours of TV watched. What causes all of the values to be
clumped together? Compare this histogram to the bar chart you generated in question
2d. Which is a better display for these data?
Here is a histogram of the hours of TV watched:
All of the values in the histogram are clumped together because this graph
above is skewed right, which means that most values are distributed to the left of the
dataset and the right tail is longer.
In this case, the histogram might show the values being clumped together due to the way
the data is categorized. The categories are discrete and have relatively wide intervals (e.g.,
1-2, 3-4, ...), which can make the histogram bars appear clumped instead of showing a
smooth distribution.
Comparing this histogram to a bar chart generated from the same data, the histogram may
be a better display for this data because it's designed for displaying the distribution of
continuous or discrete data, especially when you're interested in how the data is
distributed within specified ranges (like hours of TV watched). The histogram will
provide a clearer representation of the distribution within these ranges compared to a
simple bar chart.

Question 3: Find a data set which is related to a specific organisational problem either at
the macro or micre level) and apply all possible descriptive statistical techniques that you
think suitable to the problem. Write a short report, which includes the objectives of your
analysis, the research questions, the analytical techniques you apply to address to the
research questions and your findings. The maximum length of the report is 5 pages
including Tables and Figures.

You might also like