mcd1110 Sample Test 2b 2012 02

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

MCD1110-DATA ANALYSIS

Practice Test 2 B
Question & Answer Booklet
READING TIME: 10 minutes WRITING TIME: 2 hours

Instructions to Candidates
Please enter your name and student ID number below.
Name: ___________________________________________________
Student ID Number: ________________________________________
Percentage of Final Assessment: 20%

Section Possible Actual


Marks Marks

A: Relationships between two variables 26

B: Regression and data transformation 28

C: Time series 21

Total Marks 75

Candidates are reminded that they should have no materials on their desks unless their use has been
specifically permitted by the following instructions.
Approved scientific and/or graphic calculators are permitted.
Mathematical instruments and templates are permitted.
A Formula Sheet is provided in the test booklet.
Circle your responses to Multiple-Choice questions in the test booklet.
Show all working/calculations.
DO NOT USE PENCIL.
You must hand in this entire question paper at the end of the test.
Do not open this booklet until instructed to do so.

COPYRIGHT WARNING:
All materials produced for teaching this course of study, including all lectures delivered all audio and visual aids to presentation of lectures
(including overheads, PowerPoint slides and any on-line materials) and any supplementary materials, are protected by copyright.
You are permitted to use these materials only for your personal study and research. Use of the materials for any other purposes, including sale of
your personal lecture notes, without express permission of the copyright owner, may infringe copyright. The copyright owner may take action against
you for infringement.
MCD1110 - DATA ANALYSIS FORMULA SHEET

Percentage Frequency Range

count
=
percent ×100% =R largest value − smallest value
total count

Median (location) Interquartile Range

 n +1 
  IQR = Q3 − Q1
 2 
Upper fence = Q3 + 1.5 IQR Lower fence = Q1 − 1.5 IQR
Sample Mean

x=∑ x=∑ x=∑


x xf mf
or or
n ∑f ∑f
where n = ∑ f

Sample Standard Deviation

s=
∑ (x − x) 2

or s=
∑ (x − x ) 2
f
or s=
∑ (m − x ) 2
f
n −1 n −1 n −1
where n = ∑ f
R
or s≈
4

Standard (z) score


x−x
z=
s

Probability

Complement rule P( A) + P( A') = 1

Addition rule P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B )
P ( A ∪ B ) = P ( A) + P ( B )

P( A ∩ B )
Conditional probability P(A B ) =
P (B )

© Monash College Page 2 of 19


Multiplication rule P ( A ∩ B ) = P ( A)P (B A)
P ( A ∩ B ) = P ( A)P (B )

Law of total probability =P ( A ) P ( A B ) P ( B ) + P ( A B′ ) P ( B′ )

Pearson’s Correlation Coefficient

r=
∑ (x − =
x )( y − y )
, where s
∑ ( x − x ) and s
=
2
∑ ( y − y) 2

( n − 1) sx s y x
n −1
y
n −1

Coefficient of Determination r2

The Least Squares Regression Line

y= a + bx , where b=
rs y
and =
a = y − bx , where x

=
x
,y
∑y
sx n n

Time Series
y1 + y2 + y3
3 − moving mean ( smoothed y2 ) =
3

median ( y1 , y2 , y3 )
3 − moving median ( smoothed y2 ) =

value for season


seasonal index =
seasonal average

actual figure
deseasonalised figure =
seasonal index

© Monash College Page 3 of 19


SECTION A: Relationships between two variables
Part A-Multiple Choice Questions

Circle the best answer (6x1 =6 marks)


Question 1

To explore the relationship between MP3/iPOD user (yes or no) and gender (male or female), it
would be best to display the data collected in:

A. a scatter plot
B. an appropriately percentaged table
C. back to back stems plots
D. parallel box plots

Question 2

For a large sample of students at a local Primary school it was found that the correlation between
score on a test on current global affairs and height showed a coefficient of correlation, r = 0.76.
From this information it is reasonable to conclude that

A. awareness of current affairs makes children grow taller


B. all tall children will be more aware of current affairs than short children
C. awareness of current affairs depends only on height
D. taller children tend to have a better understanding of current affairs than shorter
children

Question 3

A value of Pearson’s Correlation Coefficient that indicates a weak negative association is

A. −0.48

B. 0.51

C. 0.45

D. −0.24
Question 4

The relationship between average monthly temperature and the number of air conditioners sold is
found to have a correlation coefficient of r = 0.72. We can therefore conclude that:

A. increasing temperature will cause an increase in air conditioner sales.


B. 72% of the variation in numbers of air conditioners sold is due to the variation in
mean monthly temperature
C. as the mean monthly temperature increases air conditioner sales decrease
D. 51.8% of the variation in numbers of air conditioners sold is due to the variation in
mean monthly temperature

The information in the following parallel box plots relates to questions 5 and 6.

Brand X

Brand Y

40 60 80 100 120 140 160 180 200


Salt content (mg/100g)

The parallel box plots show the variation in salt content (mg/100g) in two brands (Brand X and
Brand Y) of wheat crackers.

Question 5
The variables Salt Content and Brand are :

A. both numerical variables


B. a categorical and a numerical variable respectively
C. a numerical and a categorical variable respectively
D. both categorical variables

Question 6

The presence of a relationship between Salt Content and Brand is best shown by considering the:

A. median
B. IQR and range
C. shape
D. all of the above

© Monash College Page 5 of 19


Part B-Short Answers

Answer ALL questions in the space provided. Show ALL working and calculations.

Question 1

Scientists have investigated the frequency of the chirps of ground crickets with the current ground
temperature. The data collected is shown in the table below.

T (0F) 89 72 93 84 81 75 70 82 69 83 69 83 81 84 76
C (rate/sec) 20 16 20 18 17 16 15 17 15 16 15 17 16 17 14

C represents rate per second and T represents temperature in degrees Fahrenheit.

a After considering which are the independent and dependent variables, draw a fully labelled
scatter plot for this data on the axes below.
3 marks

© Monash College Page 6 of 19


The following data relates to parts b and c.

∑ (t −=
t )(c − c ) 151.4 ∑=
(t − t )
2
748.2 ∑=
(c − c ) 2
41.6

b Calculate Pearson’s correlation coefficient to three decimal places.


4 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

c (i) Explain two assumptions made in the calculation of a correlation coefficient.


2 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
(ii) Interpret the answer in part b.
2 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

d Calculate the coefficient of determination to 3 decimal places and interpret it.


2 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

© Monash College Page 7 of 19


Question 2

Respondents to an on-line survey were asked ‘Will you consider converting your car to LPG (an
alternative fuel)?’ The response of 405 people, and the age group they are in, is given in the table
below. It is expected that consideration of an LPG conversion for a car and age group are related.

Age group
Response
≤ 40 year old >40 years old
Yes 72 125
No 108 100
Total 180 225

a Explain why the row headings in the table are response rather than age group.
1 mark
______________________________________________________________________________
______________________________________________________________________________

b How many of the survey respondents were:


(i) greater than 40 years old.
0.5 marks
______________________________________________________________________________
(ii) less than or equal to 40 years old and would not consider an LPG conversion.
0.5 marks
______________________________________________________________________________

c Percentage the table below by calculating column percentages.


3 marks
Age group
Response
≤ 40 year old >40 years old
Yes
No
Total

d Does the data support the contention that there is a relationship between age group and
attitude to an LPG conversion.
2 marks

______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

© Monash College Page 8 of 19


SECTION B: Regression and data transformation
Part A-Multiple Choice Questions

Circle the best answer (6x1 =6 marks)

Question 1

The least squares line is:

A. not sensitive to the presence of outliers


B. based on the division of data into 3 regions
C. the line where the sum of the squares of the residuals is as small as possible
D. the line where the square of the sum of the vertical deviations at the points from the
line is as small as possible

Questions 2, 3 and 4 refer to the following information.


Studies on hearing showed that hearing test results (y) could be predicted from time spent exposed
to loud music (x) using the least squares line y=95-0.5x. The coefficient of determination is 0.42.

Question 2

The false statement based on this information is

A. the gradient of the regression line is -0.53


B. the equation predicts a test score of 22.5 after 50 hours loud music exposure
C. the independent variable is time spent exposed to loud music
D. the y intercept is 95

Question 3

The value of Pearson’s correlation coefficient, correct to two decimal places, is

A. 0.18
B. 0.65
C. -0.18
D. -0.65

© Monash College Page 9 of 19


Question 4

The scatter plot for hearing test result versus loud music exposure showed a point at (75, 56). The
residual value for this point is

A. −1.5
B. 19
C. −19
D. 1.5
Question 5

A log x transformation has the effect of

A. stretching the upper end of the x axis


B. compressing the upper end of the y axis
C. compressing the upper end of the x axis
D. stretching the upper end of the y axis
Question 6

After analysing a scatter plot and applying a number of transformations the following analysis was
obtained.
Transformation Residuals r2
log y Vs x curved 58%
1/y Vs x curved 61%
y Vs log x random 83%
y Vs 1/x random 86%

The best transformation to use is

A. log y Vs x
B. 1/y Vs x
C. y Vs log x
D. y Vs 1/x

© Monash College Page 10 of 19


Part B-Short Answers

Answer ALL questions in the space provided. Show ALL working and calculations.

Question 1

A class of secondary students was introduced to their study of bivariate data through an exercise
that required the students to measure, record and analyse their height and arm span. The data they
collected are shown below.

Arm Arm
Span(cm) Height(cm) Span(cm) Height(cm)
156 162 177 173
157 160 177 176
159 162 178 178
160 155 184 180
161 160 188 188
161 162 188 187
162 170 188 182
165 166 188 181
170 170 188 192
170 167 194 193
173 185 196 184
173 176 200 186

Some calculations that relate to the data are:

= =
r 0.91 =
sx 13.6 s y 11.2
= =
x 175.5 y 174.8

A scatter plot drawn from the data is shown.


Height Vs Arm Span

195
Height(cm)

190
185
180
175
170
165
160
155
150
150 160 170 180 190 200 210
Arm Span(cm)

© Monash College Page 11 of 19


a Interpret the scatter plot in terms of direction, form and strength.
3 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

b Calculate, to 2 decimal places, the value of b in y= a + bx , the equation to the least


squares regression line to the scatter plot, and interpret its meaning.
3 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

c Calculate, to 2 decimal places, the value of a in y= a + bx , the equation to the least


squares regression line to the scatter plot, and interpret its meaning.
3 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

d Give the equation to the least squares regression line and draw it on the scatter plot given.
Show the calculations that you used to mark this line in the correct position.
4 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

Question 2

The atmospheric concentration of carbon dioxide, CO 2 , (in parts per million) has been measured
about every 20 years over the last two centuries. Year 1 is about 1760 and year 12 is about 1990.

Year 1 2 3 4 5 6 7 8 9 10 11 12

CO 2 (ppm) 277 280 284 283 288 290 297 302 308 317 339 356

© Monash College Page 12 of 19


A scatter plot of this data is shown below.

Concentrations of CO2 Vs Year

400
350
300
CO2 ppm

250
200
150
100
50
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13

Year

a A linear regression analysis of the data found the least squares equation to be:
concentration of CO 2 = 261 + 6.3 × year, where 1760 was year 1. To check the assumption
of linearity a residual analysis is performed. Use the regression equation to help complete the
last 4 entries (to the nearest whole number) in the table of residuals.
2 marks

Year 1 2 3 4 5 6 7 8 9 10 11 12
CO 2 (ppm) 277 280 284 283 288 290 297 302 308 317 339 356
Predicted
267 274 280 286 293 299 305 311 318 324
CO 2 (ppm)
residual 10 6 4 −3 −5 −9 −8 −9 −10 −7

b Interpret the following residual plot made from the table in part a.

Residual plot

15

10

5
Residual

Year
0
0 1 2 3 4 5 6 7 8 9 10 11 12
-5

-10

-15

2 marks

© Monash College Page 13 of 19


______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
c Apply an x 2 transformation to the x values and write down x 2 values in the table below.
1 mark
Year 1 2 3 4 5 6 7 8 9 10 11 12
2
x
CO 2 (ppm) 277 280 284 283 288 290 297 302 308 317 339 356

d Plot the transformed data below.


2 marks

CO2 concentration Vs Year2

360

340

320

300

280

260
0 20 40 60 80 100 120 140 160

e Explain what steps you would take to decide if the transformation gave a better model of the
data than the original scatter plot.
2 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

© Monash College Page 14 of 19


SECTION C: Time Series
Part A-Multiple Choice Questions

Circle the best answer (6x1 =6 marks)


Question 1

The correct statement below is

A. A time series plot will always show either an increasing or decreasing trend
B. A time series plot must have time as the dependent variable
C. A time series plot will be based on bivariate data
D. The tendency for regular fluctuations of time series values on a quarterly basis is
called cyclic variation

Question 2

The characteristics of the time series 14


plot shown is best described as
A. seasonal with trend 12

10
B. cyclical only
8
C. random with trend
6
D. cyclical with trend
4

0
1 2 3 4 5 6 7 8 9 10 11 12
Quarter

Question 3

Lift ticket sales at a ski resort are recorded at the end of each month over the 4 month winter period.
A seasonal index of 0.65 for the first month tells us that

A. ticket sales in the first month are 65% below monthly average
B. ticket sales in the other 3 months must all be above 1
C. snow fell for 0.65 of the month
D. ticket sales in the first month are 35% below monthly average

© Monash College Page 15 of 19


Question 4

Month (x) 1 2 3 4 5 6 7 8 9 10
Sales (y) 5 4 6 8 5 7 13 10 12 13

For the time series given in the table, the 2 median smoothed y value centered at x = 6 is

A. 5
B. 6
C. 7
D. 8

Questions 5 and 6 refer to the following information.


The table below shows the quarterly sales of a home gardener magazine and 3 of the 4 quarterly
seasonal indices.

Quarter 1 2 3 4
Sales 240 160 190 410
Seasonal Index 0.96 0.76 1.64

Question 5

The seasonal index for the second quarter is

A. 0.16
B. 0.64
C. 0.32
D. 0.336

Question 6

The deseasonalised sales figure for the first quarter is

A. 230
B. 250
C. 220
D. 260

© Monash College Page 16 of 19


Part B-Short Answers

Answer ALL questions in the space provided. Show ALL working and calculations.

Question 1

The annual flows for the Mitta Mitta River for a certain 12 year period are shown in the following
table and are graphed on the time series chart.
Year 1 2 3 4 5 6 7 8 9 10 11 12
Flow 509 710 1634 1107 401 685 1548 1578 1012 1151 1190 1690

1800
1600
1400
1200
1000
Flow

800
600
400
200
0
1 2 3 4 5 6 7 8 9 10 11 12
Year

a Use the graphical approach to smooth the time series on the above plot using 3-median
smoothing.
4 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

b Comment on the effect of the smoothing and the apparent trend of the time series.
2 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

© Monash College Page 17 of 19


The annual flows for the Mitta Mitta River for a second, later 12 year period are shown in the
following table and are graphed on the time series chart.
Year 1 2 3 4 5 6 7 8 9 10 11 12
Flow 757 1776 936 1473 717 928 850 1888 553 1139 369 1230

c (i) Use the 3-mean smoothing method to complete (to the nearest whole number) the
empty row in the table above.
4 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

(ii) Plot the 3-mean smoothed data on the chart below.


1 mark

2000
1800
1600
1400
1200
Flow

1000
800
600
400
200
0
1 2 3 4 5 6 7 8 9 10 11 12
Year

d Comment on the effect of the smoothing and the apparent trend of the smoothed data on the
chart in part c.
2 marks
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

© Monash College Page 18 of 19


e The actual trend line equation to the 3-median smoothed line in part a is:
annual flow = 50.3 × year + 811
What does this equation forecast the annual flow will be in the thirteenth year (answer to the
nearest whole number).
2 marks
______________________________________________________________________________
______________________________________________________________________________

End of Test

Additional Writing Space


______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

© Monash College Page 19 of 19

You might also like