Applied Statisctics Final Exam

Nyasha Masese 200182







Q1 a. For the following data, identify whether or not they are 1. Categorical [nominal or ordinal], or 2.
Numerical [interval/ratio] [discrete or continuous] Give examples of possible values for each random
variable. [Example: number of children living in a given home – “interval data [discrete], (0, 1, 2, 3, …)”

i. marital status
Marital status is a categorical and nominal variable. The numbers are shorthand for categories
of a variable. For example

1 would represent married people

2 would represent single people
3 would represent divorcees.
ii. Number of students who drop this statistics course is a numerical discrete variable it represents
variables that are countable for example
5 students dropped out of the course
25 students finished the course

iii. Time student spends studying for their first statistics test is a numerical continuous variable this
is to say that it takes any value within a specified range for example
Students can study anywhere between 0 hours, 1-hour 30mins to 24hours.
iv. The weight loss over the first week of a “fad” diet is continuous
v. The part on a new computer that breaks during the first year of ownership is a categorical
ordinal variable for example when a there is nothing broken on the computer the value of
broken parts is zero. When the screen breaks the value of broken parts is one etc

b. Given a data set consisting of 75 data values has 109 as the highest value and 29 as the lowest value,
construct the class intervals, showing the class limits of all the classes. [10 marks]

c. With suitable examples highlights the rules relating to the drawing of cross tables, multiple bar
charts and composite bar charts. Discuss the circumstances which would require the use of each form
of data presentation.

A cross table is also known as a pivotal table is a two-way table with rows and columns that records the
frequency of respondents with specific characteristics. Cross tabulation tables provide a wide range of
information concerning relationship between variables. A cross table is used when there is not an
obvious connection between data. For example, hypothetical variables Country of residence and
favorite singer. Data can be analyzed several times in a side-by-side sequential format with column
variables called banners and row variables called stubs. Cross tabulation is used often on categorical
data that is data that can be divided into separate mutually exclusive groups. It is also used when
analyzing data with relationships that are not obvious thus making it useful when conducting market
researches and survey responses.

A multiple bar chart shows relationship between different values of data. In a multiple bar diagram, two
or more sets of inter-related data are represented. For example, we may want to represent imports and
exports of a country over several different years. We would have our years on the x axis and our import
and export values on our y axis. To represent the imports and exports, we would have to use different
colors to represent each for easier identification. We use a multiple bar chart in situations where we
need to compare grouped data variables to other groups with those same variable types. They can also
be used if we want to compare mini histograms to each other such that each bar group would represent
intervals of a variable.

A composite bar chart also known as a stacked bar chart allows the standard bar chart to be able to look
at numeric values across two categorical variables. Each bar is divided into several sub-bars stacked end
to end each corresponding to a level of the second categorical variable. For example, a clothing store
retailer may want to depict revenue for a particular time period across two categorical variable’s
location and department. We can have location as our primary category so it will be shown on our x axis
with revenue on the y axis. We will then be able to stake the revenues of each department in a given

Q2a. A quality control manager takes a random sample of 100 packets of biscuits from a production
line in order to check the mean weight of the whole production. The net weights he found are
tabulated below:

Weight in grams Frequency

Less than 247 0

247 and less than 248 4

248 and less than 249 21

249 and less than 250 40

250 and less than 251 27

251 and less than 252 7

252 and less than 253 1

Over 253 0

Estimate the mean, mode, median of the production. [10 marks]

b. . Random samples of 1000 persons have been obtained for three countries and their incomes

have been measured. The summary statistics for the per capita income distribution over the three

countries is given below.


MEAN 10000 10000 10000

MEDIAN 14000 8000 10000


LOWER VALUE 9000 7000 8500

HIGHEST VALUE 15000 12000 12000

Discuss using the variations in the earnings in the three countries and suggest which country would you

comment your uncle to go and find a job, use the statistics in the table in your presentation. [15


The appropriate measure of variability would be the range since there are no serious outliers

Range = Highest – Lowest value

Country A = 15 000 – 9000

= 6000

Country B = 12 000 – 7000

= 5000

Country = 12 000 – 8500

= 3 500

With the given data I would suggest that my uncle goes and seeks employment in country C. If we are to
look at the standard deviation, country C has the lowest standard deviation which goes to show that the
income is clustered closer together closer to the mean thus making this data more reliable unlike A
which has such a height standard deviation meaning the income is scattered over a huge wage gap. The
country C also has a good average salary which means that it is a good prospect. There is a small wage
gap between the highest paying job and the lowest paying job which suggest that the general living
standards are high.

