mcd2080 Tutorial Questions 2018 03

MCD2080
Tutorial Questions
&
Computing Exercises
MCD2080 - Tutorial Questions and Computing Exercises
Information on all Tutorials:
Tutorials are divided into 3 parts: Part A, Part B and Excel Session
Part A questions are required to be completed by the student before attending allocated
tutorial class. This will be checked by your Tutor at Tutorial classes.
Part B questions will be completed with the guidance of your tutor during tutorial class.
Your tutor will award participation marks based on your performance in your tutorial session
(based on all the three parts). Participation marks will be awarded out of 100 %. The awarding
of marks will be based on the following criteria.
1. Tutorial part A questions are completed before attending the tutorial class. Answers
should be written clearly with all required steps.
2. Tutorial part B questions should be completed by the end of the tutorial class. Answers
must be written clearly with all required steps.
3. Excel Session lessons.
• Excel session is essential for applying the relevant statistics methods learned in
classes (both Lectures and Tutorials) using Excel to analyse data information.
• For you to accomplish Excel Computing Session effectively, you are supposed to
watch the video clips in weekly “Video Lessons” under Weekly Tutorial Folder on
Moodle. The computing session will consist of Exercises activities which will be
completed using excel (watch the video clips)
• NOTE there are Excel Exercises during week 2 and 4 to be completed with
questions answered online on Moodle.
4. Students should participate actively (discussion) in all the sessions for full marks.
5. Students must attend the tutorial class on time. Every 15 min being late will result in a
deduction of marks by 5%.
Tutorials 1 & 2:
The first two tutorials are about learning Excel and using it to do statistical calculations etc.
Work done in these tutorials will be useful when completing the Excel Exercises (Week 2 &
4). The work for these tutorials is designed for self-paced learning. There are a series of
videos that you are required to watch to enable you successfully complete the weekly
Tutorial Excel Lessons.
Tutorials 3 - 12:
The Tutorial participation and engagement activities will start from Week 3 onwards.
MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 1

Week 1: Tutorial: Algebra revision
Where appropriate use a calculator to solve the Tutorial Exercises. For calculations in these questions,
show your working, including relevant formulae.
1. If a = 3, b = 5, and c = 8, evaluate the following expressions:
(a) a + b / 2 (b) ( a + b) / 2
(c) ( a + b) / 2 + c / 2 (d) ( ( a + b) / 2 + c ) / 2
Note: It is useful to be able to read the kind of notation used in this question, because that is how
formulae appear in the formula bar in Excel. In Excel, the variables would be cell addresses. So for
example instead of a, b, and c, we might have A1, A2, and A3.
The answer to part (a) would be the In order to obtain the answer to (b),
number in cell B1. you will need to supply brackets:
2. Draw a number line, place the following numbers on the number line, and then answer the
questions below. –1.645, 1.645, –1.6, 1.6, –1.7, 1.7. [Remember that if x is to the left of y on the
number line, then we say x < y .]
For each of the following inequalities, state whether it is true or false:
(a) 1.6 < –1.645 (b) 1.6 < 1.645
(c) –1.645 < –1.6 (d) –1.645 < –1.7
3. In the following, n is the number of times an event occurs, so it might be positive or zero but not
negative. List the non-negative numbers n that satisfy each inequality:
(a) 3≤n<6 (b) 4<n≤6 (c) 3<n<7

(d) n is at least 4 and no bigger than 10
(e) n is greater than 4 and less than 10
(f) n is at most 1
4. Transform the following equations to make x the subject and hence find the value of x.
3x
(a) −2=
11
4
x−3
(b) = 1.645
2
x − 2.3
(c) = z , where z = 1.96
1.5
In (c), write x in terms of z and then make the substitution z = 1.96 .

5. (i) Suppose that n = 3, x1 = 5, x2 = 8, x3 = 14, y1 = 7, y2 = 3 and y3 = 8.
Evaluate the following:
3 3 3
(a) ∑ xi (b) ∑ yi (c) ∑ xi yi
i =1 i =1 i =1
3 3  3 2
(d) ∑ (xi − 9) (e) ∑ (xi − 9) 2
(f)
∑
 x
i


i =1 i =1  i =1 
3 2
 n  n
(g) ∑ xi2 (h)  ∑ i  ∑ xi
x − 2
i =1 =  i 1=  i1
6. Suppose the average income in a certain town has varied over the years, as shown in the table:
Average
Year income
2000 $57,309
2005 $55,430
2010 $69,408
(i) What was the percentage change in average income from 2000 to 2005?
(ii) What was the percentage change in average income from 2000 to 2010?
7. Scientific notation. (This will help in reading Excel output, especially in regression.)
In many computer packages, including Excel, scientific notation has format such as for example:
1.2098E-03
In standard scientific notation this means

1.2098 ×10−3
Write the following numbers in standard scientific notation:

(a) 612
(b) -0.0000793
(c) 3.756E-12
Write the following numbers in decimal notation accurate to three decimal places:
(d) 1.603 × 10-3
(e) 4.9862E-02
(f) 5.0907235E03

8. The following Summary table lists Australia’s largest banks, and the value of their assets. Calculate
the mean, median and mode of these quantities where appropriate. Interpret the values you have
calculated.
(Recall from high school the definitions of mean, median and mode of a set of data. This question
requires you to calculate these quantities and interpret them.)
Bank Assets ($millions)

National Australia Bank 397471
Commonwealth Bank 305995
ANZ Bank 259345
Westpac 221339
St George 63714
Macquarie Bank 43771
Suncorp Metway 42278
Citibank 17668
ING 16245
HSBC 13567
Source: Black et al: Australasian Business Statistics
Tutorials 2 – 12:
• Study Plan: Topic Introduction.

• Each week from week 2 onwards, the tutorial will be divided into Part A:
Pre-tutorial Questions (to be done before class) and Part B and Excel
Computing Lesson.
Requirement: Before Tutorial 2, complete Week 2 all questions in Part A.
Recommendation: Textbook questions for further practice:
3rd or 4th edition

Introduction and data collection (Chapter 1)

Excel Computing Session: Week 1 - Working with functions, Pivot Table
introduction
General Introduction
This is a set of exercises to develop your skills with Excel. The exercises are divided into groups. If you
already have some skill with the package, you will not need to do all the exercises in the
introductory group.
Please watch video clips on Moodle in the “tutorial folder” under WEEK 01
 These series of videos provide basic Excel skills required for the completion of Excel
Computing Activity Tasks.
 Simply click on the "Next" or "Back" buttons to navigate through all videos.
Practice with “MyMathLab STUDY PLAN” for Mid-Tri Test
In addition, the instructions in the exercises are usually sufficient for you to ‘get by’. There is an external
link for Extra Excel Tutorials given under the “Textbook and Excel Notes” Briefcase on Moodle Note
that there is an emphasis here on good practice in Excel — not just doing things, but doing them in the
best way. Students who have a Mac Book, and are willing to use it outside MCD2080 Computing Lab
session, find instructions relation to Excel on the Moodle site.
It is assumed that you have some basic computer knowledge; that you:
• can boot up a computer and log in if necessary, as it is on campus
• understand ‘files’, ‘directories’, etc.
• can operate within the appropriate operating system, usually a version of Windows (though Excel
is available for Apple computers).
In these exercises, specific statistical techniques are developed which are closely related to the lectures,
tutorials, assignments, tests and exam questions.
Some preliminary notes

Excel is a ‘spreadsheet’ software package. It creates a document (a file) which can be opened, saved,
closed, or edited. This document may be called a spreadsheet, but Excel calls it a workbook, made up
of a number of worksheets. Each worksheet can be thought of as a single page of the workbook.
It is most efficient to save files to the hard disk (drive c:) and in these exercises it is assumed that you
do this. If you are working in a campus laboratory, you will then need to copy the files to a portable
storage device (e.g. USB stick) to keep them. Use Windows Explorer to do this when you have
finished your session.
Exercise 1: Background
Real Estate in Regional Australia
The nation’s regional property market has generally been weaker than urban areas over the past year,
which has opened up some great buying opportunities.
Your first task is to locate the data to work with. You will find the data and relevant tables in the file
PROPERTY.xls in the Tutorial Material folder within the Week 1 section on Moodle.
When reading the instructions in these tutorial problems, you may wish to refer to the following diagram
to identify the locations mentioned.

Task 1: Microsoft Windows 10: splitting the screen
Windows 10 enables you to split the screen vertically into 2 displays. For example, you could have this
Word document open on the left side of the screen and Excel open on the right. Thus, you can read the
instructions on the left while you carry out the instructions on the right. This Windows 10 feature can be
very useful, especially in tutorials or when completing the Excel Exercises.
How to split:
1. While holding down the “Windows” key , press the “ ” OR “ ” key to take the
active windows to the left and the other active windows to the right respectively.
2. For example, open PROPERTY.xls. While holding down the “Windows” key, press the “ ”
arrow key and this will take the Excel with PROPERTY.xls window to the right side of the screen.
Do the same with any other document open but holding again holding “Windows” key and
pressing the “ ” arrow key to get it on the left side of the screen.
(Please watch the video for illustration).
Task 2A: Calculating the percentages of properties in each location across the varying number
of bedrooms using Excel FORMULAS approach
The great deal of the power of spreadsheet packages also lies in the way in which they use formulas to
carry out repetitive calculation.
In this section of the tutorial, you will learn how to use formulas. The concepts of ‘relative’ and
‘absolute’ cell addresses are absolutely essential when using formulas. A great deal of the power of
spreadsheet packages also lies in the way in which they use formulas to carry out repetitive calculations.
In this task, we need to complete Table 1 in the worksheet Property Size in the PROPERTY.xls Excel
spreadsheet. We will start with location Rural first.
In cell I4 type: =B4/B9. Alternatively, you can type = and then click on cell B4, type / and then click on
cell B9. (Note the equals sign. The equals sign tells Excel that what is to follow is a formula, to be
carried out).

You need to change the number of decimal places to 2. This can be done as follows:
Right click on the cell and choose “Format Cells”. A dialogue box with a number of tabs
appears. The Number tab (shown here) offers more detailed options to change the way cell
data is displayed. Change the number of decimal places to 2.
Alternatively, click twice on the Decrease Decimal button on the Home tab on the Ribbon
(in the group labelled Number).
‘Decrease Decimal’ button
In practice we would enter the formula once only. Entering a formula more than once is a waste of
effort, and likely to lead to error. To get the formula into the other cells we can copy it. Drag and drop
cannot be used here because the destination range is not the same shape as the source range, but the
keyboard shortcuts can be used:
However, before we copy a formula, we need to change the cell to an absolute cell.
Excel interprets a cell address in a formula as a relative cell address. For cell I4, it divides B4 by B9.
Therefore, if B9 is not changed to an absolute cell, when copying the formula for I5, it will divide B5

by B10. Since B10 is empty, it cannot do the division (as shown below).
To prevent this, B9 must be entered as an absolute cell address. Excel uses dollar signs to identify an
absolute address. These can be entered by typing, but the easiest way to enter these is to press the F4
function key. Then any cell addresses that are highlighted in the formula bar acquire dollar signs.
(Please watch the video for illustration)
Now select I4, move the cursor over the bottom right hand corner of the cell till it changes to a thin,
solid cross. Press the mouse key and hold it down while you drag the cursor down the column. Release
the key and the entries will appear. Get the sum of these percentages for the Rural location into I9 using
formula =SUM(I4:I8). Find more about sum function in Task 3. If all is correct this sum must be 1.00.
Do you know why?
We need to change all values into percentages. This can done in two ways:
1. Highlight the numbers in column I (I4:I9). Right click and select “Format cells”. In the number
tab, choose “percentage” and change the decimal places to 1.
2. Another way to format as a percent is to click on the Percent button, also in the Number group
on the Home tab. This will display with no decimal places.
‘Percent’ button
Repeat the same for location Town.

Task 2B: Calculating the count and percentage count of properties in each location across the
varying number of bedrooms using PIVOT TABLES approach
(note: Summarizing data using Pivot Tables will be used in many topics. You need to start using Pivot
Table approach gradually from week 1. Pivot Tables will be included in Tutorial – week 4 in more
detailed form.)
In this task, we will learn how to tabulate the number of bedrooms in the two locations, Rural and
Urban. In other words, we need to create a cross tabulation known as a Pivot Table. To do this, open
the worksheet Data in PROPERTY.xls. (Please watch the video for illustration)
• To create a Pivot Table, place the cursor anywhere in the data set.
• Click on Insert and choose Pivot Tables as shown below.
• The following window will appear. It highlights the entire data range by default.
• Click on Existing Worksheet and choose a blank cell anywhere on your worksheet. This will
allow you to place the pivot table on the same worksheet. Otherwise the pivot table will be placed
on a new worksheet.
• Click OK and a blank pivot table will appear with the Field List shown on the right of the screen.
• On the Field List, drag Bedrooms into the Row Labels window below, Location into the
Column Labels window and Location again into the Σ Values window. This is shown in the
screenshot below.

• The actual variable placed in the ‘Σ Values’ window is irrelevant,
so long as you summarise it by the Count function.
• You should have a table with the following vlues.
• Finally, always remember to produce your output in tabular form. If you are using Excel 2013 or
beyond, your output will automatically be produced in tabular form. If you are using an earlier
version, click on the Design tab, followed by Report Layout and choose Show in Tabular
Form. Refer to the screenshot below.

Pivot Tables as a Cross-tabulation- row totals, column totals and grand totals.
Frequency tables or one-way tables or Summary Tables represent the simplest method for analysing
categorical data. They show the frequency, proportion or percentage of data in each category.
A cross tabulation or contingency table, on the other hand, is a summary table for two categorical
variables as shown in the table obtained above. Thus, cross tabulation allows us to examine observations
that belong to specific categories on more than one variable. By examining these frequencies, we can
identify relations between cross tabulated variables.
Based on the table above, calculate the percentages of properties in each location across the varying
number of bedrooms.
We can achieve this in two ways,

• Pivot Table functions
• Excel formulas (has been covered in part A of this exercise)
Let’s go back to the Pivot Table we created in before.

Click on any cell inside the pivot table. Click on the drop down arrow in the ‘Σ Values’ window.
Choose Value Field Settings as shown below.

Then select the Show Values As tab and select % of Column Total or % of Row Total or any
other necessary value.
Compare your Pivot Table with the final image of the Pivot Table with Count and Percentage Count in
the Solution of lab exercises: Week 1 section (at the end of week 1 Tutorial)
Task 3: Graphical Comparison of properties in each location across the varying number of
bedrooms. Continue working on task 3 in week 2
Having created the percentage distribution, the next task is to graph it as a bar chart. Excel calls
this a Column Chart.

To turn the blank rectangle into a column chart, you will need to tell Excel which data you want
to graph. Click on the Select Data button in the Data group on the Design tab.
This tab only appears on the Ribbon when the chart is selected. Point to the range H3:J8. (Note
that this includes the heading.) The dialogue box should now appear as below.
Since the No. of Bedrooms is already correctly represented on the Horizontal Axis, we need to
remove it from Legend Entries (Series). Do this by first highlighting ‘No. of Bedrooms’ and then
click on ‘Remove’, then OK. You should now have produced a column chart. (Please watch the
video for illustration)
Now switch to the Layout tab. In the Labels group, click on Chart Title, then Centred Overlay
Title. A text box will appear above the chart. Type in an appropriate title.

Select the Insert tab, and click the Column Chart button in the Charts group. Select the first
chart on the drop-down menu that appears. If nothing was selected beforehand, a blank 'chart'
will appear on your worksheet.
To remove the grid lines, highlight the lines and press delete.
Task 4: Table of Summary Statistics (Using Insert Function) Continue working on task 4 in
week 3.
For this task, we need to complete table 2 in the worksheet Selling Price in the file PROPERTY.xls.
The following statistics can be calculated using formulas listed below for variable RURAL. You can
practice to get the same results by using Insert Function, Statistical Functions etc. Find the same list of
descriptive statistics for variable TOWN.
Statistics for RURAL – Selling Price ($)

Mean = AVERAGE(B3:B36)
Median = MEDIAN(B3:B36)
Standard deviation = STDEV.S(B3:B36)
Minimum = MIN(B3:B36)
Maximum = MAX(B3:B36)
Range Calculated as max - min
Lower quartile = QUARTILE.EXC(B3:B36,1)
Upper quartile = QUARTILE.EXC(B3:B36,3)
Calculated as the difference of the third and first

Interquartile range quartiles
Coefficient of variation Calculated as ratio of standard deviation over mean
(When you use QUARTILE.EXC, Excel calculates the quartiles except possible extreme values. The
alternative is QUARTILE.INC which includes extreme values and sometimes leads to odd values.)
Interquartile range, Standard deviation, Range and Coefficient of variation are all measures of
variability.
Exercise 2
A great deal of the power of spreadsheet packages lies in the way in which they use formulas to carry
out repetitive calculations. Open the file Caring&Sharing.xlsx.
Task 1: Find expenditure for each day (check solution in the Excel table in Task 3)
Task 2: Find 2.5% brokerage on expenditure for each day (check solution in the Excel table in
Task 3)
Task 3: Find the total and average of quantity, price, expenditure and brokerage.
Excel has a large number of built in functions. Some of these operate on individual cells or pairs of cells
- for example, the multiply function and some operate on ranges. All of them can be used as parts of
formulas. This exercise covers the sum and average functions.
In cell A8 type Sum, and in A9 type Mean. In C8, enter the total
quantity of shares bought over the four days by clicking on the
AutoSum button in the Editing group on the Home tab. The

formula =SUM(C4:C7) appears in the formula bar, and the cells ‘AutoSum’
being summed have shimmer around the edge. button
Excel assumes you want the sum of cells C4:C7, which in this case is correct, so just press Enter. To
obtain the totals of the other columns, drag the sum just calculated across the row, to fill cells D8:F8.
The total of the prices is meaningless, so delete this entry. Select the cell and press the Delete key.
In C9 enter the formula
=AVERAGE(C4:C7). (Type
=AVERAGE( then use
pointing to enter the cell range,
then type the closing bracket.)
Again drag across the row to
copy the formula across.
Alternatively, click in C9, and
select Average from the drop-
down menu
next to the Autosum button, and point to the appropriate range.
Save the file.
Solution of lab exercises: Week 1
Exercise 1: 'Real Estate in Regional Australia'
Calculating Percentages
Using Pivot Table

Bar Chart (to be completed in week 2)
Summary Statistics (to be completed in week 3)

Week 2 Tutorial: Descriptive Statistics I
 These series of videos provide basic Excel skills required for the completion
of Excel Computing Activity Tasks.
Part A:
1. The owner of a large fleet of taxis is trying to estimate his costs for next year’s operations.
One of the major costs is fuel purchases. To estimate fuel purchases, the owner needs to
know the total distance his taxis will travel next year, the cost of a litre of petrol and the fuel
consumption (in km/litre) of his taxis. The owner has been provided with the first two
figures (distance estimate and cost). However, because of the high cost of petrol, the owner
has recently converted his taxis to operate on LPG. He measures the fuel consumption for
50 taxis and the results are stored.
a. What is the variable of interest?

b. What is the population of interest?
c. What is the parameter the owner needs?
d. What is the sample?
e. What is the statistic?
f. Describe how the statistic will produce the kind of information the owner wants.
2. We have considered the following classifications of data types: qualitative and

quantitative, where qualitative data can be further divided into nominal and ordinal; and
quantitative data can be further divided into discrete and continuous.
a. Why is it important to distinguish between these data types in statistics?
b. A bank officer interviewing a housing loan applicant records, among other
things:
i. The postcode of the house for which the loan is desired
ii. the number of dependent children that the applicant has
iii. the applicant’s credit rating (scored 1, 2, ..., 9 or 10; based on the bank officer’s
subjective judgment)
iv. the applicant’s monthly pay in dollars and cents.
For each of the measurements above, state whether measurement is at the
nominal, ordinal, or quantitative level. If the measurement is quantitative, also
state whether it is discrete or continuous. Give reasons for all of your answers.
Part B:
3. Classify each of the following variables as numerical or categorical, discrete or continuous,
ordinal or nominal.
a. your student ID number
b. eye colour (brown, blue, . . . )
c. whether a person drinks alcohol (yes, no)
d. length of cucumbers (in centimeters)
e. number of cars in a car park
f. salary (high, medium, low)
g. salary (in dollars and cents)
h. daily temperature in ◦C

4. Data collected for the following variables is to be displayed graphically. Which type of
graph is most appropriate, bar charts, pie charts or histograms?
a. number of passengers in flights — 100 flights in sample

b. type of petrol purchased (super, unleaded, premium)
c. prices of cars sold in Melbourne over a weekend
d. state of residence of a sample of 200 Australians
e. number of cigarettes smoked in a day (a sample of 120 people)
5. One hundred and twenty-one university students ( n = 121) were asked to identify their
preferred leisure activity. The results are displayed in a bar chart, as shown below.
Preferred Leisure Activity of University

Students
30
25
Percentage
20
15
10
5
0
Sport TV Music Movies Reading Other
Preferred Leisure Activity
a. What percentage of students in total nominated either listening to music or reading as

their preferred leisure activity?
b. State the least popular leisure activity for these students and quote the percentage.
c. Based on your answer in (b), how many students rated this activity as their preferred
fi
leisure activity? [hint: use the following ∑ P= i 100%, P=
i × 100% , where
all activities n
Pi is percentage frequency and f i is absolute frequency for i-th activity]
6. Recall Lecture 1 (Exercise 2.17, p36 Berenson) – data represents the electricity cost in
dollars during the month of July for a random sample of 50 two-bedroom apartments in a
New Zealand city.
We created a table with class intervals using the Pivot “Group” option.
Based on the information in the table, around what amount does the monthly electricity cost
seem to be concentrated?
(hint: focus on modal class)

7. The gross hourly earnings of a group of factory workers randomly selected from the payroll
list of a large Melbourne company were organised into the following frequency distribution:
HOURLY EARNINGS ($) NUMBER OF
WORKERS
8 to 10 22
10 to 12 34
12 to 14 64
14 to 16 54
16 to 18 26
Total 200
a. Find the % of workers for each hourly earning interval

b. Construct a histogram using % of workers found in part a.
c. Should you have only this chart available, find:
- Interval where mode and median hourly earnings will be included
- Interval(s) with the largest concentration of hourly earnings
- Find the approximate value of hourly earning such that 87% of workers
earn less than that amount.
- Estimate the percentage of workers in the sample who earn more than
$11.5 per hour?
3rd edition: Introduction and data collection (Chapter 1)p. 10: 1.4
Presenting data in tables and charts (Chapter 2)
p. 24: 2.3, 2.4 and p. 39: 2.20
4th edition: Introduction and data collection (Chapter 1)p. 10: 1.4
Presenting data in tables and charts (Chapter 2)
p. 25: 2.3, 2.4
p. 40: 2.20

Excel Computing Session: Week 2 – Pie Chart, Bar Chart, Data
Sorting,
 COMPLETE EXCEL HOMEWORK AND SUBMIT ONLINE (MOODLE) BY END OF WEEK 3.
Histogram
We recommend that you need to learn individually sections E3 from notes on the use
of Excel (ExcelNotesv3.pdf) on Moodle
Exercise 1
Complete Task 3 from Exercise 1 in Lab Week 1
Exercise 2
Task 1: Modifying a graph
Wets and Dries is a company which provides
economic advice to governments. They bill by the
hour. Over the last three months, the number of hours
of advice billed each day is recorded in the table
shown here.
Open a new workbook, enter the data and format it as
shown.
To wrap the text in the two headings, select them, and then click on the Wrap Text button in
the Alignment group on the Home tab. Use the AutoSum button to obtain the total.
We want to calculate the corresponding
percentage distribution, as shown here. To
calculate the percentages, in cell D2 enter the
formula =C2/$C$6*100. Note the dollar signs!
Drag this down to D5. Reduce the number of
decimal places to two by clicking on the
Decrease Decimal button three times.
Drag from C6 to D6 to get the sum of the percentages.
Having created the percentage distribution, the next task is to graph it as a bar chart.
Excel calls this a Column Chart. (This is reflection of your knowledge from the
previous exercise). The resulting bar graph should look like this

We will change the graph to a pie chart. It
could similarly be changed to a line graph. In Percent of days
all these types of graph – column, bar, pie and
line - the values on the horizontal axis are only 13%
labels, so these axes do not have true numeric 27%
scales.
21%
39%
3 4 5 6
With the chart selected, click on the Change Chart Type button on the Design
tab. In the dialogue box that opens, click on Pie, then OK. This chart is now
not very informative, because of the options you selected for the column chart.
The Chart Layouts group on the design tab gives you more sensible options
for the layout of the pie chart. Layout 2 is shown above. Alternatively, you can ‘Change
use the Layout tab to make individual changes to the appearance of the chart, Chart Type’
as described above for the column chart. button
Note that the multicolored pie chart does not print well in black and white. For this kind of
printing it is better to select the greyscale version (on the Design tab, at the left of the Chart
styles group).
Finally, click on the Chart title and drag it to one of the corners. Save the file.
Exercise 3 Using COUNTIF and bar charts for qualitative data
The trustee of a mining company’s accident compensation plan solicited the employees’
feelings toward a proposed revision in the plan. The responses are shown in the following table.
There are three employee types (Mine-Workers, Clerical staff and Managers denoted by W, C
and M) and two kinds of decisions, either for (F) or against (A). Data are available in file
Compensation.xls.
Task 1: Sort the data by the variable “Job classification”.
When sorting a number of columns of data by one variable, you must highlight the whole block
of data (all required rows and columns) first. If you just highlight the column for the variable
you are sorting by, then the values of this variable will be separated from the cases to which
they belong. Click on Sort & Filter in the Editing group on the Home tab, and select Custom
Sort. The dialogue shown overleaf appears. Make sure ‘My data has headers’ is checked. In
the Sort by box, select ‘Job classification’. In the Sort On box, select ‘Values’. In the Order
box, select ‘A to Z’. Click OK to sort the data.

Note that you have to complete Task 1 prior doing Task 2.
Task 2: Cross-tabulate data subject to types of employee and kinds of decisions.
Use the Excel function Countif to obtain the number of each kind of Decision
corresponding to each employee type. In other words fill in the frequencies in Table 2 in
the file, reproduced here:
W C M
F
A
For example the number in the shaded cell of Table 2 (cell G3 in the worksheet) should
be the number of clerical staff in favour of the scheme.
Instructions for using countif follow:
When you have sorted the data, you should find that the list of Decisions by clerical staff
lie in cells C3:C14.
So in cell G3, type:

=COUNTIF(C3:C14, “F”)
In cell G4, type:

=COUNTIF(C3:C14, “A”)
and so on.
EXTRA: Using COUNTIFS for qualitative data – UNSORTED DATA

Another way of obtaining the number of each possible Decision without sorting the data is
using Excel function: COUNTIFS. This is an extension to COUNTIF. COUNTIFS works
the same way as COUNTIF, counting the number of cells that satisfy the criteria you specify,
however allows for multiple criteria. In particular, we can use the COUNTIFS function to
check whether a value falls inside a range specified by two cut-off values.
Using the data in “COUNTIFS” Worksheet, without sorting the data, obtain the counts in
cells AND RECORD IN TABLE 2:
F3 by typing: =COUNTIFS(B3:B32,"W",C3:C32,"F");
F4 by typing: =COUNTIFS(B3:B32,"W",C3:C32,"A");
Do same for the other cells (G3, G4 then H3 and H4)

Task 3: Bar graph - extension
Now create a grouped bar chart (called “clustered column chart” in Excel) showing the
number of employees in each category who are for and against the decision as follows.
(i) Select the data in Table 2 including heading.
(ii) On the Insert tab, select Column/2-D Column (the first chart sub-type). You can
choose between series in rows or in columns by clicking the Switch Row/Column
button in the Data group on the Design tab (A and F series, or W, C, and M series).
Decide which one is appropriate. Also try the “Stacked column” and “100% stacked
column” (the next two sub-types) by clicking the Change Chart Type button and
consider what aspects of the data are thereby made clear.

Exercise 4: Carbon Emission
As a first approximation, it can be argued that per capita carbon emissions are an equitable
measure of a country’s responsibility for climate change concerns. In computing session this
week and next, you will analyze data relating to carbon emission for a sample of 122 countries.
With the expectation that both Asian and African economies will grow strongly over the next
few decades, it is of interest to know what their emissions levels are at the moment, as a point
of comparison for the future. The countries have therefore been divided into two groups
according to geographical region. You will analyze the data for Africa and Asia (Group 1),
separately from the other continents (the Americas, Australasia, and Europe comprising Group
2.) Understanding of tasks related to carbon Emission data is important for satisfactory
completion of Assignment 1.
You will work with data from the file CarbEm.xlsx, provided in the subject weekly computing
exercises in electronic resources. This file is extracted from World Bank data for the year 2007
and provides, for a sample of 122 countries,
• Geographical group (Group 1 or 2, as listed above)

• their per capita carbon emissions (CO2E, in units of tones per person),
• their populations
Task 1: Construct a frequency histogram (not using Pivot chart approach) for the
variableCO2E, choosing the interval width that you consider to be appropriate.
SOLUTION STEPS
a. Find the maximum and minimum values of the variable CO2E using the functions
MAX and MIN (see Excel Notes E4, particularly E4.2.)
etc
b. With 122 data points, it is reasonable to have about 8 to 10 class intervals, so choose a
convenient starting point and width of class interval to cover all the data from the
minimum to the maximum value calculated in A.
(For example, you may decide first class interval starts at 0, have 9 intervals, of width
4, or you may decide on shorter intervals to show more detail.)
c. Construct a frequency table, filling in the lower and upper limits. The upper limit of
one interval is equal to the lower limit of the next. According to the Excel convention,
a class interval includes its upper limit but not its lower limit. Thus the first class
interval shown below includes values greater than 0 and less than or equal to 4.
d.
Upper
Lower limit limit Frequency
0 4
etc
e. In order to obtain the frequencies to complete this table, use the histogram tool:
Choose the Data tab. In the right hand group, click on the Data analysis button

From the dialogue box, select Histogram
In the resulting dialogue box,
• select as the input range the CO2E column, including the heading; select as the Bin
range the Upper Limit column of the table you are creating, including the heading.
• Check the labels box (because you have included headings in your range selections).
• Click on the white box next to Output Range and point to the cell where you want the
output to start.
• Check Chart Output.
• You should see something like
f. You need to make the so-called histogram look something like a histogram
• The legend is not needed – get rid of it (Click to highlight it, and delete)
• Provide a more informative heading (Click to highlight the current heading and
type a replacement) Do the same with the axis titles.
• Right-click on one of the histogram bars; select Format data series, and slide Gap
Width to No Gap

Resulting image of your histogram
CO2 emissions per capita, 2007

60
Number of countries
50
40
30
20
10
0
(Note that the horizontal axis labels should be at the upper limit of each class but are
actually in the middle.
A quick and easy hint how to fix this is to replace the upper limits with the interval midpoints:
1. Create extra column with interval midpoints (2, 6, 10, ….)
2. Left click on horizontal axis with values 4, 8, 12 … [known as select axis]
3. Right click and Select Data option [or in Design tab click on Select Data]
4. In Horizontal (Category) Axis Labels click Edit
5. In Axis label range select cells with interval midpoints (2, 6, 10, ….) O.K. and O.K.
After this step original values 4, 8, 12, …. will change to 2, 6, 10, ….
6. Name your horizontal axis accordingly (Interval midpoint of…..) and provide units of
measurements. Resulting image of your histogram
Histogram of ........
60
50
Frequency
40
30
20
10
0
2 6 10 14 18 22 26 30 34 38
Interval Midpoint of CO2E metric tons per capita

Task 2: Construct a frequency histogram (using PIVOT chart approach – to be completed in
Week 4) for the variableCO2E
We recommend following this procedure:
1. Select data with labels i.e. for CarbEm data C1:G123 and Insert Pivot Table into new
worksheet
2. Select (click on box) CO2E in the Pivot Table Field List
3. Drag and drop CO2E into Row Labels
4. Select any cell in this column
• In the Pivot Table Tool tab (pink color) select Group the field option
• In the dialogue box
Starting at fill in 0 (this is the lower limit of the first interval)
Ending at fill in 40 (this is the upper limit of the last interval and
will have zero frequency)
By fill in 4 (this is the constant interval width)
5. Drag and drop CO2E into ∑ Values and frequencies will appear
6. Report the table as a “Tabular Form” by clicking “design” then select “show in Tabular
Form”
To obtain the following:
CO2E metric Count of CO2E metric tons per

tons per capita capita
0-4 57
4-8 33
8-12 19
12-16 5
16-20 4
20-24 1
24-28 1
28-32 1
32-36 1
Grand Total 122
THEN
7. Select any cell in the Pivot Table. In the Tools select Pivot Chart (in the Option tab),
select Column and O.K.
8. Close gaps between columns using the following: in Design (in drop down menu) select
Layout #8. Resulting working image of your histogram

Histogram of CO2E emissions
60
50
40
Frequency
30
20
10
0
0-4 4-8 8-12 12-16 16-20 20-24 24-28 28-32 32-36
CO2E emissions
9. Remember to edit the chart to optimise presentation, such as you need to

annotate both axes and use the descriptive title of your histogram.
Exercise 5: Using VLOOKUP Function

VLOOKUP is an abbreviation for the phrase Vertical Lookup. Specifically, it looks for a value
in the leftmost column of a table, and then returns a value in the same row, using another column
you specify.
Once you learn how to use VLOOKUP, you'll be able to use it with larger, more complex
spreadsheets and that's when it will become truly useful!
Download the file Bonus Awards.xls. In this example, we look at the sales at Granite Insurance
where the Director needs to award bonuses to her sales force. These bonuses are awarded based
on sales performance.
There are 15 salespeople, each with their own area. Based on the size and population of the
area, each salesperson has a sales target for the year.
The measure of performance used for awarding bonuses is based on the percentage achieved
above the sales target. Based on this measure, a salesperson is placed into one of five bonus
bands and awarded bonus points as shown below in Table 1.

Each staff is then awarded a percentage of the bonus pool, based on the percentage of the total
points awarded.
As shown in cell E11 of the spreadsheet, the bonus pool is $250,000 for this year. The bonus
bands shown in the instructions above are in cells A7:C11.
In this table, column A gives the lower limit of the bonus band, column B the upper limit, and
column C the bonus points awarded to anyone in that bonus band. For example, salespeople
who achieve sales 56 percent above their sales target would be awarded 15 bonus points.
The VLOOKUP function in this case allows us to extract a subset of data from a larger table of
data based on some criterion. The general form of the VLOOKUP function is:
=VLOOKUP (lookup_value, table_array, col_index_num, [range_lookup])
where
lookup_value = the value to search for in the first column of the table
table_array = the cell range containing the table
col_index_num = the column in the table containing the value to be returned
[range_lookup] = TRUE if looking for the first approximate match of value and FALSE if
looking for an exact match of value.
Note: VLOOKUP assumes that the first column of the table is sorted in ascending order.
There are two tables provided:
Table 1 is the bonus band.
Table 2 refers to the information regarding the name of sales persons and percentage above
their target sales.
YOUR TASK:
Use the VLOOKUP function to assign the number of bonus points to each salesperson.
The VLOOKUP function is used in column C to look in the bonus band table and
automatically assign bonus points.
Once you have completed this task, move to column D and convert your answers in C to a
percentage of the total bonus points. Now in column E, calculate the bonus amount.

Exercise 1: (as Task 3 in week 1)
Exercise 2: included in the text
Exercise 3:
Task 1: Table1 Data Sorting
Employee
Employee type Decision
ID
7 C A
8 C F
11 C F
13 C A
16 C F
17 C F
18 C A
22 C F
25 C F
26 C F
27 C A
28 C A
3 M F
9 M F
14 M A
20 M F
24 M A
30 M F
1 W F
2 W A
4 W A
5 W F
6 W A
10 W A
12 W F
15 W A
19 W A
21 W A
23 W A
29 W F
W: mine Worker F: For
C: Clerical staff A: Against
M: Manager
Task 2: Table 2 Cross-tabulation
Mine workers Clerical staff Managers
In favor 4 7 4
Against 8 5 2

Task3: Bar graphs - extension
Attitude to super scheme by employee type
Number in favour and against 8
5 In favour
4 Against
0
Mine workers Clerical staff Managers
Attitude to super scheme by employee type
16
Note that equal numbers are in
14
favor and against but the
Number in favour and against
12 proportions from each employee

10
Managers
type are quite different.
8 Clerical staff
Mine workers
6
0
In favour Against
Attitude to super schem e by em ployee type

The 100% stacked bar chart gives
100% percentages rather than absolute numbers.
Percentage in favour and against
90%
Thus we see that a high proportion of
80%
managers and a low proportion of mine
70%
workers are in favor of the scheme.
60%
Against However, from this chart, we cannot see
50%
40%
In favour the relative number in each worker
30%
category.
20%
10%
0%
Mine w orkers Clerical staff Managers
Exercise 4: included in the text

Exercise 5:

Week 3: Tutorial: Descriptive Statistics II
 These series of videos provide basic Excel skills required for the completion of
Excel Computing Activity Tasks.
Part A:
1. Berenson p.90: 3.30 modified – The set of the data below is from a sample of n = 7
12,7, 4,9,0,7,3
(a) Calculate the mean, median and mode

(b) Calculate the range, interquartile range, variance, standard deviation and coefficient
of variation. Interpret the IQR.
(c) List the five-number summary, construct the box-and-whisker plot with a suitably
scaled and labelled axis and describe the shape.
2. The side by side boxplots below shows the distribution of age at marriage of 45 married men
and 38 married women.
(a) Compare the two distributions in terms of:
i. measures of central location,
ii. measures of variability, and
iii. shape (note that it is not possible to comment on modality; do you know
why?)
(b) Comment on how the age at marriage of men compares to women for the data.
3. Comparison of key features of data distributions

A study of Melbourne’s climate compared annual rainfall values for the 75 years from 1861 to
1935 (“Historical”) with annual rainfall values for the following 75 years 1936 to 2010
(“Recent”).

Using the exhibits below, compare the distribution of yearly rainfall totals for the “Historical”
period with the distribution for the “Recent” period. Remember to use the required approach
by making appropriate references to the
(i) measures of central location,
(ii) measures of variability, and
(iii) shapes of the distributions (note that it is possible to comment on modality; do you
know why?).
30 Histogram of "Historical" Annual Rainfall

25
20
Frequency
15
10
200
300
400
500
600
700
800
900
1000
1100
Annual Rainfall (mm)
30 Histogram of "Recent" Annual Rainfall

25
20
Frequency
15
10
0
200
300
400
500
600
700
800
900
1000
1100
Annual Rainfall (mm)
Recent
Historical
200 300 400 500 600 700 800 900 1000 1100
Annual Rainfall (mm) Mean

Part B:
4. The remuneration packages for the CEO’s of 12 international companies are (in $US 000’s)
as follows:
2512 3424 3800 4152
2636 3640 3870 4480
3424 3690 4078 9020
The following table of summary measures was obtained using Excel:
Remuneration of CEO’s
Mean 4060.5
Median 3745
Mode 3424
Standard
Deviation 1663.4
Coefficient of
variation
Lower quartile 3424
Upper quartile 4133.5
Interquartile range
Range
Minimum 2512
Maximum 9020
Sum 48726
Count 12
(a) Complete the table by supplying the coefficient of variation, range and interquartile
range.
(b) For each of the following, comment on its suitability as a measure of a “typical
value” from this dataset:
(i) mean
(ii) median
(iii) mode
(c) In breaking news, it has just been announced that the highest paid of these CEO’s
has negotiated a new remuneration package and will now receive $25 million.
For this revised data set, calculate the revised value of each of the following
summary measures, and briefly comment on whether and how the value has
changed from the corresponding value given in the table above.
(i) Mode
(ii) Median
(iii) Mean
(iv) Range
(v) Interquartile range
(vi) Given that the new standard deviation is 6201.34, calculate the new coefficient
of variation
(vii) Comment briefly on which measures of central location have changed
significantly.
(viii) Comment briefly on which measures of spread have changed significantly.

5. As part of an Australian Household Expenditure Survey (1988-89), the following data
was collected for 1000 households:
INCOME = Weekly household income (in dollars)
CONSUME = Consume alcohol (1 = yes, 0 = no)
The variable income was studied for the two groups: “Consume alcohol”, and “Do not
consume alcohol”, and the following graphs and summary statistics were obtained.
Notice that values 250, 500, 750 etc. refer to Weekly income ($) upper class limit and
not to the interval midpoint.
Percentage frequency for income of

households that consume alcohol
25.0%
20.0%
15.0%
10.0%
5.0%
0.0%
250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
3250
3500
3750
4000
Weekly income ($)
50.0% Percentage frequency for income of

40.0% households that do not consume
30.0%
alcohol
20.0%
10.0%
0.0%
250
500
750
1000
1250
1500
1750
2000
2250
2500
2750
3000
3250
3500
3750
4000
Weekly income ($)
Consume
No alcohol alcohol
Mean 456.9 708.4
Median 353 638.5
Modal class $0-$250 $500-$750
Standard deviation 403.0 461.3
Coefficient of
variation 88.2% 65.1%
Minimum 12 12
Maximum 3846 3696
Lower quartile 173.75 356.75
Upper quartile 632.25 936
Interquartile range 458.5 579.25
Count 234 766

Using the above results, compare the distribution of the variable “Income” for the two
groups, discussing typical values (i.e. “central tendency”), how spread out the values are,
and the shape of the distributions. Comment on what this tells us about the association
between income and the consumption of alcohol. Remember to follow required
approach.
3rd and 4th edition
Numerical descriptive measures (Chapter 3)

p. 78: 3.1, 3.6, 3.8
p. 84: 3.23 and p. 90: 3.33
Excel Computing Session: Week 3 – Descriptive Statistics, Box Plot
 These series of videos provide basic Excel skills required for the completion of Excel Computing
Activity Tasks.
[hint: you can use Split Screen i.e. the learning outcome from Computing Lab Week
1, Exercise 1, Task1]
Exercise 2
Task 1: Summary statistics using Data Analysis tool.
In this exercise, the data gives the dividend yield on shareholders’ funds for Australia’s top
150 companies for the year 2005.
Note that Dividend yield is defined as the amount of a company’s annual dividend expressed
as a percentage of the current price of the share of that company. The data is in the file
Dividend.xlsx. Column A stores the dividend yield for the top 1 – 50 companies (Group A)
ranked by market capitalisation, Column B stores the dividend yield for the companies 51 –
100 (Group B), and Column C stores the dividend yield for companies 101 – 150 (Group C).

Use Data Analysis tool to fill in the table of summary statistics given in the Worksheet.
Group A Group B Group C

Count
Minimum
Maximum
Range
Mean
Std.Dev.
Coefft of variation
Median
Lower Quartile
Upper Quartile
IQR
Use the Data Analysis button on the Data tab, and select Descriptive Statistics. If
Data Analysis is not available, see Excel Notes section E5.
Copy and paste all relevant values from Excel output. Individual functions may be
accessed using the Insert Function button, discussed in Exercise 1. to obtain the
remaining values such as Range, Coefficient of variation, Quartiles and IQR.
Note that since the data concerns all the top 150 companies, rather than a sample, the
STDEV.P function should be used for standard deviation. (Population standard
deviation is calculated by the function STDEV.P.)
Exercise 3 Question 3.32 from Berenson, p. 90

The data required for this question is provided in Bulbs.xls. It provides information on the life
(in hours of usage) of samples of forty 15 watt compact fluorescent (CFL) light bulbs produced
by two manufacturers, A and B.
Task 1 For each manufacturer, list the five-number summary.
Task 2 Construct a box plot (by hand, you are not expected to use Excel for this task) for each
manufacturer, using these five-number summaries. Both boxplots need to be plotted on the
same set of axes. You are expected to label the horizontal axis and state the units of
measurement.
Task 3 Compare the two distributions in terms of:
(i) measures of central location,
(ii) measures of variability, and
(iii) shape

Exercise 1: (as Task 4 in week 1)
Exercise 2:
Group A Group B Group C

Count 50 50 50
Minimum 0.6 0.0 0.0
Maximum 8.6 8.5 9.2
Range 8 8.5 9.2
Mean 4.2 4.1 4.5
Std. Deviation 1.82 2.33 2.35
Coefficient of
43.3% 56.8% 52.2%
variation %
Median 3.95 3.85 4.8
Lower quartile 2.70 2.30 2.80
Upper quartile 5.43 5.73 5.93
IQR 2.73 3.43 3.13
Note: since we have all 50 top companies, and the next 50, etc, these are to be regarded as
populations, not samples, therefore, use STDEV.P
Exercise 3:
Task1:
A B
Minimum 5544 6701
Q1 6708.25 7578.5
Median 7316.5 8140.5
Q3 8085 9027.25
Maximum 8731 9744
Task 2:
Box Plots for Manufacturer A and B
Note: You are expected to name your horizontal axis as CFL Bulb Life and provide the units
as (hours).

Task 3:
i. Mean: Manufacturer A 7377.33 hours vs Manufacturer B 8260.9 hours
Median: Manufacturer A 7316.5 hours vs Manufacturer B 8140.5 hours.
Both the mean and median life in hours of usage for manufacturer B is greater than
bulbs produced by manufacturer A. Hence the overall average life of bulbs produced
by manufacturer B is longer.
ii. The range of life for bulbs produced by manufacturer A (3187 hours) is larger than
those produced by manufacturer B (3043 hours). However, the IQR for manufacturer
B is larger (1448.75 hours) than IQR for manufacturer A (1376.75 hours). Therefore
the distribution of the middle 50% of the life of bulbs for manufacturer A is both lower
and less varied than B.
iii. Both the distributions of life of bulbs produced by manufacturer A and B are positively
skewed as shown by the mean > median. This means that for both manufacturer A and
B, there are a small number of bulbs that have a much longer lifespan than is usual.

Week 4 Tutorial: Descriptive Statistics III - Exploring Data with
Pivot Tables and Probability I - Discrete (general and Binomial)

The following exercises are based on analysis of data in Excel files

• Elecmart.xlsx
• Catalogue Marketing.xlsx
Part A:
Question 1.
Refer to the spreadsheet Elecmart.xlsx . Recall the Lecture week 3, you produced and
interpreted a pivot table that showed the breakdown of Time across the Regions, by putting
Time in the row and Region in the column along with changing position of variables in a
pivot table.
You will be working with variables Gender and Region now. To demonstrate the differences
you get when you change the position of a variable in a pivot table, you need to use the
following pivot tables and interpret the values below.
% Grand Total (Table 1: Joint & Marginal Percentages)

Count of Region Gender
Region Female Male Grand Total
MidWest 10.75% 7.00% 17.75%
NorthEast 15.50% 13.25% 28.75%
South 15.75% 7.50% 23.25%
West 16.50% 13.75% 30.25%
Grand Total 58.50% 41.50% 100.00%
% Row (Table 2: Conditional by Row (Region) Percentages)

MidWest 60.56% 39.44% 100.00%
NorthEast 53.91% 46.09% 100.00%
South 67.74% 32.26% 100.00%
West 54.55% 45.45% 100.00%
Grand Total 58.50% 41.50% 100.00%

% Column (Table 3: Conditional by Colum (Gender) Percentages)
MidWest 18.38% 16.87% 17.75%
NorthEast 26.50% 31.93% 28.75%
South 26.92% 18.07% 23.25%
West 28.21% 33.13% 30.25%
Grand Total 100.00% 100.00% 100.00%
Provide an interpretation for the following cells so that you understand the impact of
changing the position of variables in a pivot table.
(i). From Table 1 interpret the values of 41.5% and 15.5%.
(ii). From Table 2 interpret the value of 54.55%
(iii). From Table 3 interpret the value of 31.93%
Question 2.
Among households in which at least one child is attending a private school, it is found that the
total number of tablets (iPad, Kindle, etc.) owned by members of the household has the
following probability distribution:
Number of tablets X 0 1 2 3 4 5 6 or
more
Probability P( X ) 0.33 0.25 0.22 0.12 0.07 0.01 0.00
(a) Write down the formulae for the mean and variance of a discrete probability
distribution.
(b) Use the table below to calculate the mean µ and standard deviation σ of the number
of tablets.
X P( X ) X × P( X ) P( X ) × ( X − µ ) 2
0
1
2
3
4
5
Total
(c) Also, find the median and mode of the number of tablets.

Part B:
3. HyTex Company
HyTex Company is a direct marketer of electronic equipment and wants to investigate the
efficacy (Is HyTex sending the catalogues to the right customers? If not, to whom should
HyTex send the catalogues?) of catalogue mailings to its 1,000 mail order customers.
Catalogue Marketing.xlsx contains customer demographic attributes including the Marital
Status of the customer and the Region they live in. The following pivot tables have been created:
% Grand Total (Table 1: Joint and Marginal Percentages)

Count of Region Married
Region Not Married Married Grand Total
East 11.40% 13.80% 25.20%
Midwest 13.80% 12.30% 26.10%
South 12.90% 12.40% 25.30%
West 11.70% 11.70% 23.40%
Grand Total 49.80% 50.20% 100.00%
% Row (Table 2: Conditional by Row (Region) Percentages)

East 45.24% 54.76% 100.00%
Midwest 52.87% 47.13% 100.00%
South 50.99% 49.01% 100.00%
West 50.00% 50.00% 100.00%
Grand Total 49.80% 50.20% 100.00%
% Column (Table 3: Conditional by Column (Marital status) Percentages)

East 22.89% 27.49% 25.20%
Midwest 27.71% 24.50% 26.10%
South 25.90% 24.70% 25.30%
West 23.49% 23.31% 23.40%
Grand Total 100.00% 100.00% 100.00%
a) Interpret the following values to understand the differences between the pivot tables.
(i) From Table 1 the values 12.40% and 25.30%
(ii) From Table 2 the value 47.13%
(iii) From Table 3 the value 24.50%
b) Is the percentage of married customers who live in the Midwest region smaller than
the percentage of customers who are not married that live in the South region?

4. McHammer Hardware (McH) sells widgets which are manufactured by Widget World
(WW). McHammer recently learnt that 30% of Widget World’s widget output consists
of the old model and that the old model widgets are randomly mixed in with the new
model of widget. WW despatches widgets in boxes of 6. Let X denote the random
variable: the number of old widgets in a box of 6 with the following probability
distribution:
x 0 1 2 3 4 5 6
Probability 0.1176 0.3025 0.3241 0.1852 0.0595 0.0102 0.0007
(a) What is the probability that a box

(i) contains no old model widgets?
(ii) contains all old model widgets?
(iii) contains at least one old model widget?
(iv) contains more than 4 old model widgets?
(v) contains less than 3 old model widgets?
(vi) contains no more than 3 old model widgets?
(vii) is ‘mixed’? (A ‘mixed’ box neither has new widgets only nor has old widgets
only.)
(b) (i) What is the most likely number of old model widgets in a box? Explain your
answer.
(ii) What is the expected number of old model widgets in a box? Explain your
answer.
(iii) Must the expected number be quoted as a whole number? Explain your answer.
5. In the following scenario, state whether X is a binomial random variable. Explain your
answer.
Thirty percent of households buy the leading brand of dishwasher detergent. A random sample
of 25 households is surveyed to determine the brand of dishwasher detergent they buy. Let X
be the number of households in the sample that buy the leading brand.
6. From experience, a teacher has determined that the number of times a student has failed
to attend class, X, has the following probability distribution:
X 0 1 2 3
Probability 0.75 0.16 0.06 0.03
(a) Suppose that the teacher has 20 students currently in her class. If we let Y be the
number of students who attend the class,
(i) What kind of probability distribution does Y have? State the values of the
parameters.
(ii) What is the probability that there are no more than 15 but at least 8 students who
will attend the class?
(b) What is the expected number of students who will attend the class?
(c) What is the standard deviation of number of students who will attend the class?

Extra Practice:
1. From experience, a retailer has determined that the number of broken light bulbs, X ,
in a box containing 10 dozen Super brand light bulbs has the following probability
distribution:
X 0 1 2 3
Probability 0.80 0.10 0.05 0.05
(a) What is the probability that in a randomly selected box of Super light bulbs, X
satisfies corresponding phrase (in the first column)?
To complete the following question you may wish to refer to a number line: 0 1 2 3
For each phrase in the first column, write down the corresponding inequality or inequalities
(Column A), and the list of numbers specified by this phrase (Column B). Then write the
probability that X satisfies this condition along with result (Column C). As an example, the
first two are completed.
Phrase Column A Column B Column C

X is at least 1 X≥1 1,2,3 P(X ≥ 1) = 0.2
X is at most 1 X≤1 0,1 P(X ≤ 1) = 0.9
X is less than 1
X is no more than 1
X is more than 1
X is 1 or more
X is 1
X is no less than 1
X is at least 1 but
less than 3
X is between 1 and 3
(inclusive boundaries)
X is between 1 and 3
(exclusive
boundaries)
(b) What is the expected value of X ?

(c) What is the standard deviation of X ?
2. Calculating binomial probabilities instructions on how to use the Excel function

BINOM.DIST are given in Section E4.3 of the Excel notes and in the lecture week 4.
(a) In the following, X is the number of successes in n independent trials, where the
probability of success on each trial is p . Calculate the following probabilities:
(i) For n = 4 , p = 0.35 find P( X = 4) and P( X ≤ 2) . (Use Excel, Why?)
(ii) For n = 15 , p = 0.6 find P( X > 4) and P(5 < X ≤ 10) . (Use tables, and check
your answer using Excel.)
(iii) For n = 17 , p = 0.4 what is the probability that X is at least 12? (Use Excel)

(b) A new drug is found to be effective on 87% of the patients tested. If the drug is administered
to 17 patients, find the probability that it is effective for 12 or more of the patients. (Use
tables, and check your answer using Excel.)

3rd edition
Discrete distributions (Chapter 5)
p.144: 5.1, 5,4
4th edition
Discrete distributions (Chapter 5)
p.145-146: 5.1, 5,4
Excel Computing Session: Week 4 – Constructing Pivot Tables and

Pivot Charts and working with discrete random variables
 These series of videos provide basic Excel skills required for the completion of Excel Computing
Activity Tasks.
You need to understand Lecture 3. For a quick reference watch the following
video: Excel 2007 Tutorial PIVOT TABLE (Part 1: Basic Introduction)
https://www.youtube.com/watch?v=w8WnVPmzmTk
https://youtu.be/9NUjHBNWe9M Useful videos
https://youtu.be/g530cnFfk8Y for your Excel Exercises
Exercise 1
To achieve a better understanding of Pivot Tables and make a link between Descriptive
Statistics I, II and current part III, we will continue working with Elecmart.xlsx data
with variables Gender (male, female) and Buy Category (low, medium, high).
Produce a pivot table for the variables Gender in the Column and Values fields and Buy
Category in the Row field, report the count values as:
o % Grand Total (Table 1) - (What kind of percentages are these?)
o % Row (Table 2) - (What kind of percentages are these?), and
o % Column (Table 3) - (What kind of percentages are these?)
a) Interpret the following values:

(i) 16.50% from Table 1
(ii) 40% from Table 1
(iii) 35.83% from Table 2

(iv) 38.89% from Table 3
b) Is the percentage of female customers given they are purchasing at the high buy
category level larger than the percentage of male customers if they are purchasing
at the medium buy category level?
Exercise 2
For McHammer Hardware (McH) data, question 4, calculate the standard deviation of
old model widgets in a box. Use Excel file McH.xlsx.
Exercise 3
Repeat tutorial question 6 (a) (ii) using the relevant Excel functions BINOM.DIST instead of
tables. Instructions how to use Excel functions BINOM.DIST are given in Section E4.3 of the
Excel notes and in the lecture week 4.
Solutions to Excel Exercises – Week 4

Exercise 1
a)
Table 1 - Grand total % (Joint and Marginal Percentages)

Count of Gender Gender
BuyCategory Female Male Grand Total
Low 22.75% 17.25% 40.00%
Medium 16.50% 13.50% 30.00%
High 19.25% 10.75% 30.00%
Grand Total 58.50% 41.50% 100.00%
Table 2 - Row % (Conditional by “BuyCategory” Percentages)

Low 56.88% 43.13% 100.00%
Medium 55.00% 45.00% 100.00%
High 64.17% 35.83% 100.00%
Grand Total 58.50% 41.50% 100.00%
Table 3 - Column % (Conditional by Gender Percentages)

Low 38.89% 41.57% 40.00%
Medium 28.21% 32.53% 30.00%
High 32.91% 25.90% 30.00%
Grand Total 100.00% 100.00% 100.00%

a)
(i) 16.50% of all customers are females purchasing at the medium buy category
level
(ii) 40% of all customers purchase at the low category level
(iii) 35.83% of male customers given they purchase at the high buy category level
(iv) 38.89% of customers purchasing at low buy category given they are females
a) Yes, as the percentage of female customers purchasing at the low buy category
level is 38.89% whilst the percentage of male customers purchasing at the
medium buy category level is 32.53%.
Exercise 2
Exercise 3
Y has a binomial distribution with n = 15 and p = 0.8
P(4 ≤ Y ≤ 8)= P(Y ≤ 8) − P(Y ≤ 3)

BINOM .DIST (8,15, 0.8, true) − BINOM .DIST (3,15, 0.8, true)
= 0.0181 − 0.0000 = 0.0181

MCD2080 - Tutorial Questions and Computing Exercises
INFORMATION
Week 5 - 11:
There will be Formative Assessment Tasks covering topics learned in Week 5 – 11.
Remember to Practice with MyMathLab STUDY PLAN for your Final Exam
Formative Assessment Tasks (FAT)

FAT I: Week 08: (Paper based)
 It will be done during Computer Lab period.
 It covers weeks 4 – 7 Lecture material & respective tutorials (Normal &
Sampling, CI, HT)
 Instructions will be given by your Teacher
FAT II: Week 10: (Paper based)

 It covers weeks 8 & 9 Lecture material & respective tutorials (Regression & Chi-
Square)
FAT III: Week 11: (Using MyMathLab)

 It covers weeks 10 Lecture material & respective tutorial (Time Series)

Week 5 Tutorial: Probability II - Continuous probability
distributions – Normal distribution
Practice with “MyMathLab STUDY PLAN” for EXAM
Tutorial Questions:
1. When using tables to obtain standard normal probabilities, values of Z can only be
specified to two decimal places. Use Table 1 in the statistical tables provided to find
the following probabilities. You will need to round the Z-values to two decimal places
in order to use the tables. If you get the relevant probabilities using Excel (refer to
computing lab session) you will observe some differences from the answers obtained
using Excel.
(i) P ( Z < −0.443) (iii) P ( Z > −1.944)

(ii) P( Z < 1.522) (v) P (1.522 < Z < 1.958)
2. When using tables to obtain standard normal percentiles, values of Z can only be
specified to two decimal places. Use Table 1 in the statistical tables provided to find
the following percentiles. You will need to round the Z-values to two decimal places
in order to use the tables. If you get the relevant percentiles using Excel (refer to
computing lab session) you will observe some differences from the answers obtained
using Excel
(i) P( Z < z* ) = 0.5 (iv) P( Z > z* ) = 0.025

(ii) P( Z < z* ) =
0.895 (v) P (0 < Z < z* ) = 0.400
(iii) P( Z < z* ) =
0.054 (vi) P(− z* < Z < z* ) = 0.882
For your answers to the following questions, please remember to do the following:
• Define the variable
• State the distribution
• Draw curves
3. The lifetimes of the heating element in a Heatfast electric oven are normally
distributed, with a mean of 7.8 years and a standard deviation of 2.0 years.
(a) (i) If the element is guaranteed for 2 years, what percentage of the ovens sold will
need replacement in the guarantee period because of element failure?
(ii) In a year in which 10,000 ovens are sold, how many ovens would you expect to
have to replace in the guarantee period because of element failure?
(b) What proportion of elements are expected to last for between 2 and 10 years?

(c) Heatfast is reconsidering the length of the guarantee period on heating elements.
Calculate the length of guarantee period such that Heatfast would expect to replace a
maximum of 1% of ovens due to element failure.
(d) Find the length of time such that it includes 95% of all ovens. Include a statement
describing your answer.
4. Based on laboratory testing, the lifetime of a Tyrannosaurus brand tyre is taken to be

normally distributed, with mean 65,107 kilometres (km) and standard deviation of
2,582 km. The tyres carry a customer warranty for 60,000 km.
(a) What proportion of the tyres is expected to:
(i) fail before the warranty expires?
(ii) fail after the warranty expires, but before they have lasted for 70,000 km?
(iii) last more than 72,500 km?
(b) Bert claims to have owned a Tyrannosaurus tyre which lasted 145,000 km.
Respond to Bert’s claim, without performing any further calculation.
(c) (i) Obtain the 50th percentile of tyre life. Explain, to a non-statistician, what this
value means.
(ii) Obtain (to the nearest km) the 99th percentile of tyre life.


3rd edition
The normal distribution (Chapter 6)
p. 186: 6.2, 6.4, 6.6, 6.8, 6.10
4th edition
The normal distribution (Chapter 6)
p. 187: 6.2, 6.4, 6.6, 6.8, 6.10
Excel Computing Session: Week 5 – Working with normally

distributed random variables – using Excel
To be able to answer the Excel Exercises you need to read and practice the following:
Standard normal probability (µ = 0, σ = 1)

The function NORM.S.DIST (not to be confused with NORM.DIST) calculates probabilities
associated with the Standard Normal Distribution. It calculates cumulative probabilities (that
is, probabilities of the type P (Z < z) where z is a fixed value) so corresponds to the
probabilities obtained by using Table 1 in this Unit Guide.
Select a cell to contain the result, click on Insert Function, Statistical, NORMSDIST, then OK
to proceed to the Function Arguments dialogue box, shown below for NORM.S.DIST
Insert the required z value. Insert “True” in the cumulative box. Click OK.
If you supply any positive or negative value z in the dialogue box, NORM.S.DIST will yield
the result p, where p = P(Z < z) is the area to the left of z under the standard normal curve
(shaded). Note that if z < 0, NORM.S.DIST(z, true) will be a number less that 0.5:

shaded area p
p < 0.5
0 z z 0
For example obtain:

1. P(Z < 1.2) = ?
2. P(0 < Z < 1.2) = ?
3. P(Z > 1.2) = ?
You should get these values

1. P(Z < 1.2) = 0.8849
2. P(0 < Z < 1.2) = 0.8849 – 0.5 = 0.3849
3. P(Z > 1.2) = 1 – 0.8849 = 0.1151
It is always helpful to sketch a normal curve, and shade the area you are looking for, in order
to work out exactly what calculation is needed.
Any normal probability

Printed tables for finding normal probabilities rely on standardisation – the tables used are of
the standard normal distribution, with mean 0 and standard deviation 1.
The function NORM.DIST calculates probabilities associated with any Normal distribution.
You must specify the mean and standard deviation, but you do not have to convert to a Standard
Normal Distribution. Thus using NORM.DIST to obtain probabilities is much easier than using
tables, since you do not have to standardise the variable.
Select a cell to contain the result, select the NORM.DIST function from the list of Statistical
functions and click OK. The dialogue box shown below appears.
Insert the required values (as values or cell addresses) and TRUE in the cumulative box.
(Entering FALSE gives the height of the normal curve at that value of X – this is useful for
graphing the distribution, but not for calculating probabilities.) Click OK.

For example, if the random variable X is normally distributed with mean µ = 20 and standard
deviation σ = 5, obtain the probability that:
1. X is less than 30
2. X is greater that 30
3. X is between 15 and 30
X ~ N(µ = 20, σ = 5)
1. Require P(X < 30)
Answer: 0.9772
2. Require P(X > 30)
P(X > 30) = 1 – P(X < 30) = 1 – 0.9772 = 0.0228
3. Require P(15 < X < 30)
First obtain P (X < 15)
Then
P(15 < X < 30) = P(X < 30) – P(X < 15) = 0.9772 – 0.1587 = 0.8185
SUMMARY:
P = NORM.DIST(x, 𝜇𝜇, 𝜎𝜎, true) p = NORM.S.DIST(z, true)

p p
x 𝜇𝜇 z 0
Calculating percentiles
The pth percentile of the standard normal distribution

NORM.S.INV does the reverse of NORM.S.DIST. Given a probability, it supplies the
corresponding z-value.
Select a cell to contain the result, and select the NORM.S.INV function from the list of
Statistical functions. The following dialogue box appears. Insert the required probability. Click
on OK.
In the above dialogue box, if you supply the value of the shaded area p shown below,
NORM.S.INV will return the value of z0. Note that if p < 0.5, z0 will be a negative number.
shaded area p
For example, what is the (note: p > 0.5 in
this example)
1. 10th
2. 95th
0 z0
percentile of the standard normal distribution?

1. Require z0 such that P(Z < z0) = 0.1
Answer: z0 = – 1.2816.
2. Require z0 such that P(Z < z0) = 0.95
Answer: z0 = 1.6449.
Note: this is the value of z 0 that cuts off an upper tail of area 0.05:
Any normal distribution (any value of population mean µ and standard deviation σ )
Printed tables for finding normal probabilities rely on standardisation – the tables used are of
the standard normal distribution, with mean 0 and standard deviation 1. With the Excel we
can find the percentile without standardization.
Obtain the pth percentile of any normal distribution

Select a cell to contain the result, select the NORM.INV function from the list of Statistical
functions and click OK. The dialogue box shown below appears.

Insert the required values. Click OK.
For example if X ~ N(µ = 20, σ = 5), what is the 99th percentile of X?
Require x0 such that P(X < x0) = 0.99.
Answer: x0 = 31.6312.
SUMMARY:
z=NORM.S.INV(p) x=NORM.INV(p, 𝜇𝜇, 𝜎𝜎)
p p
z 0 x 𝜇𝜇
Exercise 1 Calculating standard normal probabilities

Repeat tutorial question 1 using the relevant Excel function (NORM.S.DIST) instead of tables.
Exercise 2 Calculating probabilities for any normal distribution

(a) Suppose X ~ N (24.3,13.22 ) ,here σ =13.2 .
Calculate the following probabilities
(i) Pr( X < 20)
(ii) Pr( X > 30)
(iii) Pr(22 < X < 27)

Exercise 3 Calculating standard normal percentiles
Repeat tutorial question 2 using the relevant Excel function (NORM.S.INV) instead of tables.
Exercise 4 Calculating percentiles for any normal distribution
Suppose X ~ N (24.3,13.22 ) , here σ =13.2 . Find

(i) x0 such that Pr( X < x0 ) = 0.1
(ii) x0 such that Pr( X > x0 ) = 0.1

Exercise 1:
Using NORM.S.DIST
(i) Pr(Z < - 0.443) = NORM.S.DIST(-0.443,true) = 0.328883
(ii) Pr(Z < 1.522) = NORM.S.DIST(1.522,true) = 0.935995
(iii) Pr(Z > -1.944) = 1-NORM.S.DIST(-1.944,true) = 0.974052
(v) Pr(1.522< Z < 1.958) = NORM.S.DIST(1.958,true)-
NORM.S.DIST(1.522,true) = 0.038890
Exercise 2:
(Always select Cumulative = True for NORM.DIST)
(ii) Pr (X > 30) = 1 – NORM.DIST(30, 24.3, 13.2, True) = 0.3329
(iii) Pr(22 < X < 27)

= NORM.DIST(27, 24.3, 13.2, True) – NORM.DIST(22, 24.3, 13.2, True)
= 0.1502

Exercise 3:
(i) Pr(Z < z*) = 0.5; =NORM.S.INV(0.5) = 0
(ii) Pr(Z < z*) = 0.895; =NORM.S.INV(0.895) = 1.2536
(iii) Pr(Z < z*) = 0.054; =NORM.S.INV(0.054) = -1.6072
(iv) Pr(Z > z*) = 0.025; Pr(Z < z*) = 0.975; =NORM.S.INV(0.975) = 1.96
(v) Pr(0< Z < z*) = 0.400; =NORM.S.INV(0.400 + 0.5) =NORM.S.INV(0.9) =
1.2816
(vi) Pr(-z*< Z < z*) = 0.882; =NORM.S.INV(0.882 + 0.059)
=NORM.S.INV(0.941) = 1.5632
Exercise 4:
(i) = NORM.INV(0.1, 24.3, 13.2) = 7.38
(ii) = NORM.INV( 1 – 0.1, 24.3, 13.2)

= NORM.INV( 0.9, 24.3, 13.2)
= 41.22

Week 6 Tutorial: Sampling and Sampling distributions and Inference
for the mean – estimation of the population mean ( σ known and σ
unknown)
Practice with “MyMathLab STUDY PLAN” for EXAM
Part A:
1. A statistical analyst who works for a large insurance company is in the process of
examining several pension plans. Company records show that the age at which its male
clients retire is approximately normally distributed with a mean of 63.7 years and a
standard deviation of 3.1 years.
(a) Calculate the probability that a randomly selected male client will retire before the age
of 65 years.
(b) If a random sample of 50 male clients is to be selected from the company database,
what is the probability that the sample mean will be less than 65 years?
(c) Close examination of the ages of recent retirees shows that the assumption of a normal
distribution may be false.
Which, if either, of your answers above would be changed by this information, and
why?
2. Consider the following sets of data drawn from a normally distributed population.
Set A: 1,1,1,1,8,8,8,8
Set B: 1,2,3,4,5,6,7,8
Each set of data is used to calculate a 95% confidence interval for the population mean.
Without doing any calculations, state, with explanation which confidence interval will
be wider.
3. The following observations were drawn from a normal population whose variance is 100
12, 8, 22, 15, 30, 6, 39, 48
and 90% confidence interval of the population mean has been calculated:
10 10
22.5 − 1.645 < µ < 22.5 + 1.645
8 8

Answer the following questions:
• Point estimator of the population mean µ is …….. and equals to ……..
• Standard deviation is ………… and symbol used for it is ……..
• zα is …………….. and I can find it in table ……………………..
2
• Standard error of the mean is …………………………….. and I can calculate it

as……………….
• The lower confidence limit is …………………… and I can calculate it
as ……………………………
• The upper confidence limit is ……………………. And I can calculate it
as ……………………………
• I am …….% confident that the population mean is somewhere
between ……….. and ………….
• The above shown calculation is based on
formula ………………………………………because ……………………….
Part B:
For all your answers, please remember to do the following:
• Define the variable
• State the distribution
• Draw curves
4. Soft drink bottles are filled so that they contain on average 330 ml of soft drink in each
bottle. The standard deviation is 4 ml. Assume that the content of soft drink bottles is
normally distributed
(a) Calculate the probability that a randomly selected bottle will contain less than 325 ml?
(b) The bottles are sold in 6-packs. What is the probability that in a randomly selected 6-
pack the mean amount per bottle is less than 325 ml?
(c) What if the assumption of a normal distribution was incorrect? What will happen to
your answers for Parts (a) and (b)? Which, if either, of your answers above would be
changed by this information, and why?
5. A random sample of 20 petrol stations in the city of Casey on a Tuesday, found that the
mean price per litre over the 20 stations was $1.52. Assuming the population standard
deviation was 3 cents
(a) Find a 95% confidence interval for the mean price of unleaded petrol in Casey on
that day and interpret.
(b) Find a 90% confidence interval for the mean and interpret.
(c) If the same mean had been found for a sample of 80 stations, what would the 95%
confidence interval be?
(i) Discuss the confidence interval width by comparing results in part (a)
with results in part (c)
(ii) Discuss the precision of estimation by comparing results in part (a)
with results in part (c)

6. Describe what happens to the width of a confidence interval and precision of the estimate
of µ when each of the following occurs. Explain without calculations.
(i) The confidence level decreases from 99% to 95%.

(ii) The sample size decreases.
(iii) The value of σ decreases.
7. Berenson p. 254 8.20 modified
The ability to deliver approved policies to customers in a timely manner is

critical to the profitability of this service to the insurance company. During a
period of one month, a random sample of 10 approved policies was selected and the
total processing time in days recorded:
73, 19, 16, 64, 28, 28, 31, 90, 60, 56
(a) Construct 98% confidence interval estimate of the mean processing time and interpret.
Calculate with precision to 2 decimal places.
HINT: Apply and learn the following systematic approach by circling what is
appropriate and find missing words.
• This is statistical inference related to estimation/hypothesis testing about the
population mean/proportion because...............................................................
• Standard deviation required for this calculation is population/sample
standard deviation.
• I will use the following formulae for calculating 98% confidence
interval ………………………………,because………………………………
• Do I know all the components what I need to substitute into this formula? Y/N
• List all components which are known…………………………………………
• List all components which are unknown………………………………………
• How do I find missing components?
• Find the 98% confidence interval estimate (include units)
• Interpret (include units)
(b) What assumption must you make about the population distribution in (a)?
(c) Do you think that the assumption made in (b) is seriously violated? Explain.
(d) If a random sample of size 50 was selected, would your answers to part (b) and (c) be
different? Explain.
8. Bags of a certain brand of tortilla chips claim to have a net weight of 400grams. The net
weights vary slightly from bag to bag and are non-normally distributed.
A representative of a consumer advocacy group wishes to see if there is any evidence that the
mean net weight is less than advertised. For this, the representative randomly selects 46 bags
of this brand and determines the net weight of each. He finds the mean of these selected bags
to be 395grams and the standard deviation to be 6.8grams. Use these data to calculate a 90%
confidence interval for the true mean weight. State the formula, show ALL working and
remember to always interpret your interval in the context of this question.

Extra Practice
1. An automatic machine in a manufacturing process is operating properly if the lengths of

an important subcomponent are normally distributed, with mean µ = 117 cm and standard
deviation σ = 2.1 cm. If the machine is operating correctly:
(i) find the probability that one randomly selected unit has a length greater than 120
cm;
(ii) find the probability that if three units are randomly selected, their mean length
exceeds 120 cm;
(iii) explain the differences between parts (i) and (ii).
(iv) Referring to your answers above, draw the probability density function of the
length of a subcomponent in (i) and that of the mean of three subcomponents in
(ii) on a single axis.
(v) Is the distribution of the sample mean less or more variable than the distribution
of the parent population? Explain your answer.
(vi) close examination of the lengths of an important subcomponent show that these
are not normally distributed. Which, if either, of your answers (i) and (ii) above
will change and why?
2. In a random sample of 400 observations from a population whose variance is 100, it was
found that x = 75 . Find the 95% confidence interval estimate of the population mean and
interpret.
3rd edition
Sampling distribution (Chapter 7)
p. 213: 7.2, 7.6, 7.8
Confidence interval estimation (Chapter 8)

p. 247: 8.2, 8.8
p. 253: 8.12, 8.14 (assume normal population), 8.16, 8.18 [Bankcost1 available in Chapter 8
data files]
4th edition
Sampling distribution (Chapter 7)
p. 214: 7.2, 7.6, 7.8
Confidence interval estimation (Chapter 8)

p. 248: 8.2, 8.8
p. 254: 8.12, 8.14 (assume normal population), 8.16, 8.18 [Bankcost1 available in Chapter 8
data files]

Excel Computing Session: Week 6 - Sampling distribution of sample
mean and Confidence Interval Estimation of the population mean
Exercise 1 Sampling distribution of sample mean
The dean of a business faculty claims that the average MBA graduate is offered a starting
salary of $73,000. The standard deviation of offers is $6,000. Use Excel to answer the
following questions.
(a) Find the probability that in a random sample of 38 MBA graduates the mean starting
salary is
i. Less than $70,000
ii. Between $70,000 and $74,000
iii. More than $75,000
(b) What is the lowest mean salary in top 5% salaries in a random sample of 38 MBA
graduate?
(c) What is the lowest mean salary in top 90% salaries in a random sample of 38 MBA
graduate?
(d) Is any assumption about the distribution of MBA graduate salaries necessary?
Explain.
The t distribution
In introductory statistics courses such as this, the t distribution is used only in its role in the
sampling distribution of the sample mean when the population standard deviation has to be
estimated by the sample standard deviation. In many texts it is used only in its inverse form, to
obtain a confidence interval or a critical value. With Excel, the direct form is available; it is
used primarily to obtain p values in hypothesis tests.
The t
distribution is
standardised, so
the mean and
standard
deviation are not
required.
However the
distribution
depends on
the number of degrees of freedom, so this has to be specified.

Finding probabilities
The function T.DIST returns the cumulative probability for the t distribution with a given
number of degrees of freedom.
The left tail probability

T.DIST(0.7,159,TRUE)
= Pr(t(159 ) < 0.7)
= 0.7575
0.7
This function can be used to calculate left tail probabilities.
However, there are also t distribution functions that deal with the area in the right tail or two
tails. This reflects the use of the distribution in obtaining p values in a hypothesis test. The
function T.DIST.RT returns the right-tail probability. The function T.DIST.2T returns the
two-tail probability.
In the following example using T.DIST.RT, a variable following the t distribution with 159
degrees of freedom has a probability of 0.01137 of having a value greater than 2.3.
1. The right tail probability
T.DIST.RT(2.3,159)
= Pr(t > 2.3)
= 0.01137
2. The two tails probability, we find T.DIST.2T(2.3,159) = 0.02275, just double the
previous answer.

Finding critical values
1. Cut off a lower tail
The inverse function for the t distribution exists in two versions: T.INV can be used to obtain
the critical value of student’s t distribution that cuts off a lower tail of a specified area.
T.INV(0.05, 159) = -1.6545
2. Cut off two tail probability
If you want the critical value of t corresponding to a 1 − α confidence interval, it is convenient

to use T.INV.2T. insert the probability α in the first space, and the appropriate number of
degrees of freedom in the second. For example, if you want a 95% confidence interval, you
should specify the probability 0.05. That is, the two-tail probability.
T.INV.2T ( 0.05,159 )
= 1.974996
In this example, if a variable t has the t distribution with 159 degrees of freedom,
P(t < – 1.975 or t > 1.975) = 0.05
3. Cut off upper tail

T.INV(0.95, 159) = 1.6545 or -T.INV(0.05, 159) = 1.6545

SUMMARY:
t=T.INV(p, df) p=T.DIST(t, df, true)
p p
t 0 t 0
t=T.INV.2T(p, df) p=T.DIST.2T(t, df)
𝑝𝑝 𝑝𝑝 𝑝𝑝 𝑝𝑝
2 2 2 2
-t 0 t -t 0 t
t=T.INV((1-p), df) p=T.DIST.RT(t, df)
p p
0 t 0 t
Exercise 2
If t is drawn from the Student t distribution with 19 degrees of freedom,
(a) find the probability P(t > 2) using T.DIST.RT

(b) find the probability P( t < -2) using T.DIST
(c) find the probability P( t > 2 or t< -2) using T.DIST.2T
(d) Find the critical value of t crit such that P(t< t crit )= 0.02 using T.INV
(e) Find the value t crit t of t such that P(t>t crit )= 0.02 using your answer to (d).
How will you find t crit without answer in part (d)?
(f) Find the critical value t crit of t such that P(t>t crit or t<-t crit )= 0.03 using
T.INV.2T

Exercise 3
In the following exercises, use NORM.S.INV to find the appropriate critical values. The data
for both parts is provided in Ex3.xls on Moodle.
(a) The following eight observations were drawn from a normal population whose variance
is 100:
12 8 22 15 30 6 39 48
Determine the 90% confidence interval of the population mean.
(b) A random sample of 400 observations was drawn from a population whose standard
deviation is 90. The data are available in Ex3.xls, worksheet (b). Estimate the population
mean with 95% confidence.
Exercise 4
The routes of postal deliveries are carefully planned so that each deliverer works between 7 and
7.5 hours per shift. The planned routes assume an average walking speed of 2kph, and no
shortcuts across lawns. In an experiment to examine the amount of time deliverers actually
spend completing their shifts, a random sample of 75 postal deliverers were secretly timed. The
data are available in Ex4.xls in worksheet (a). Assuming that the times are normally distributed,
estimate with 99% confidence the mean shift time for all postal deliverers.

Exercise 1
X = MBA graduate = salary, µ $73,000
= σ $6,600
= n 38
X = Mean salary of 38 MBA graduates
X  N (µ X , σ2X=
) µ X $73,000= 6,000
σX  $973.3
38
(a)
= i. ( )
P X < 70000  NORM .DIST
= ( 70000,73000,973.3, true ) 0.001027
ii.
P ( 70000 < X < 74000 )
NORM .DIST ( 74000,73000,973.3, true ) − NORM .DIST ( 70000,73000,973.3, true )
= 0.8469
iii. P ( X > 75000 ) =

1 − NORM .DIST ( 75000,73000,973.3, true ) =
0.0199
P ( X < X * ) = 0.95 ∴ X * = ?
(b)
=
NORM . INV ( 0.95,73000,973.3) X * $74,601
P ( X > X * ) = 0.90 or P ( X < X * ) = 0.10 ∴ X * = ?

(c)
=
NORM . INV ( 0.10,73000,973.3) X * $71,753
(d) No, it is not necessary to make any assumption about the distribution of MBA graduate
salaries. Sample size 38 is large enough (by the Central Limit Theorem) to guarantee
that the distribution of mean MBA graduate salaries is normal.

Exercise 2
If t is drawn from the Student t distribution with 19 degrees of freedom,
(a) Find the probability P(t > 2) using T.DIST.RT

T.DIST.RT(2,19) = 0.030001
(b) Find the probability P( t < -2) using T.DIST
T.DIST(- 2, 19, TRUE) = 0.030001
(c) Find the probability P( t >2 or t < -2) using T.DIST.2T
T.DIST.2T( 2, 19) = 0.060002
(d) Find the critical value of t crit such that P(t < t crit )= 0.02 = using T.INV
T.INV(0.02,19) = -2.2047
(e) Find the value t crit t of t such that P(t > t crit )= 0.02 using your answer to (d).
Based on (d), we conclude that t crit = 2.2047 or
T.INV(0.98,19) = 2.2047
(f) Find the critical value t crit of t such that P(t>t crit or t<-t crit )= 0.03 using
T.INV.2T
T.INV.2T(0.03,19) = 2.3456
Exercise 3
(a)
Note that zcrit should cut off an upper tail equal to alpha/2.
ie Pr(Z > zcrit) = alpha/2
so Pr(Z < zcrit) = 1 - alpha/2
ie zcrit = NORM.S.INV(1 - alpha/2)
Sample size 8
Population variance: 100
Population standard deviation 10
Sample mean 22.5
Alpha 0.1
Critical value 1.644854
Upper confidence limit 28.31544
Lower confidence limit 16.68456
We can state with 90% confidence that the true mean of the
population lies between 16.68 and 28.32.

(b)
Sample size 400

Population standard deviation 90
Sample mean 1009.993
Alpha 0.05
Critical value 1.959964 We can state with 95%
confidence that the true mean of
Upper confidence limit 1018.812 the population lies between
Lower confidence limit 1001.173 1018.812 and 1001.173.
Exercise 4
(Sample size n 75
Sample standard deviation: s 0.23

mean: xbar 6.91
alpha: 0.01
Critical tcrit = T.INV.2T(alpha, n - 1) 2.64
Upper limit = xbar+tcrit*s/sqrt(n) 6.98
Lower limit = xbar-tcrit*s/sqrt(n) 6.84
We can state with 99% confidence that the

true mean of the population les between 6.84
and 6.98

Week 7 Tutorial: Hypothesis testing – test about the
population mean ( σ known and σ unknown).
Practice with “MyMathLab STUDY PLAN” for Final Exam
Part A:
1. Use tables to find the p-values for the following tests. (We assume the population standard
deviation is known). If α=0.05, state the conclusion (no interpretation possible here). Circle
what is appropriate and find missing words in the text below: Draw curves and clearly show
where is p-value. Circle what is appropriate and find missing words in the text below:
(i)
H 0 : µ = 500
H 1 : µ ≠ 500
z calc = −1.76
p-value is …………… I calculate p-value for this one/two sided test as ………………….
p-value is smaller /not smaller than ………….. therefore we can/cannot reject……………….
(ii)
H 0 : µ ≤ 200
H1 : µ > 200
z calc = 2.63
p-value is …………… I calculate p-value for this one/two sided test as ………………….
p-value is not smaller /smaller than ………….. therefore we can/cannot reject……………….
2. A Chinese factory produces mother-of-pearl buttons. The mean diameter of buttons is

assumed to be normally distributed.
If the machinery is working correctly, the mean diameter of the buttons will be 1.3cm. The
variance for that machine is known to be 0.0081 cm2. A sample of 16 buttons is measured with
a laser, and found to have a mean diameter of 1.25cm. Test at the 5% level of significance, the
hypothesis that the mean diameter of the population differs from 1.3cm using the critical value
approach. Ensure that you clearly state your hypotheses, show ALL steps, ALL your working
AND interpret your conclusion in context of this question.
3. The director of manufacturing at a fabric mill needs to determine whether a new machine is
producing a particular type of cloth according to the manufacturer’s specifications, which
MCD2080 Tutorial Questions and Computing Exercise - Week 7 Page 1

indicate that the cloth should have a mean breaking strength of 30kg and a standard deviation
of 3.5kg. The distribution of the breaking strength is known to be normal.
A sample of 49 pieces of cloth reveals a sample mean breaking strength of 29.3kg.
Is there evidence that the machine is not meeting the manufacturer’s specifications for mean
breaking strength? Use a 5% level of significance and the critical value approach. Ensure
that you clearly state your hypotheses, show ALL steps, ALL your working AND interpret your
conclusion in context of this question.
Part B:
4. A company that produces bias-ply tires is considering a certain modification in the tread
design. An economic feasibility study indicates that the modification can be justified only if the
true average tire life under standard test conditions exceeds 20,000 km. A random sample of 16
prototype tires is manufactured and tested, resulting in a sample mean tire life of 20,758 km.
Suppose tire life is normally distributed with standard deviation 1,500 km (the value for the
current version of the tire). Does this data suggest that the modification meets the condition
required for changeover? Test the appropriate hypothesis using significance level 0.01. Use
critical value approach.
(a) Suggest appropriate null and alternative hypotheses, and explain your choice of null
and alternative hypotheses.
(b) State the values of the following:

Sample Mean: x = ________________
Standard deviation: s or σ =_________________!!!!!!!
(must be able to recognise for correct choice of the test statistic)
Sample size: n = ___________
Null hypothesis value of the mean: µ 0 = __________________
(c) State the test statistic. Specify distribution of this test statistic.
(d) Perform the test at the 1% level of significance by the critical value method. Use
recommended 5 steps procedure (refer to Lecture week 7)

Step 1: Null H0 and alternative hypotheses H1
Step 2: Compute the test statistic
Step 3: Determine critical value OR p- value and identify rejection region RR
Step 4: State decision rule (condition leading to rejecting H0) and make decision
about H0
Step 5: State the conclusion within the context of the problem (see hint below)
Test Statistic DOES fall in RR Test Statistic DOES NOT fall in RR

• We CAN reject H0 at the • We CANNOT reject H0 at the
(α)100% level of significance. (α)100% level of significance.
• The sample DOES provide • The sample DOES NOT provide
enough evidence against H0. enough evidence against H0.
• That is, the population mean • That is, the population mean IS
IS significantly……(interpret NOT significantly ………
H1 here). .……..…….. (interpret H1 here).
(Confirm finding the critical value during computer lab session. For critical value you
will need to use NORM.S.INV(0.99)
0.01
NORM.S.INV(0.99)
(e) Find the p-value associated with the value of the test statistic obtained from the
sample. (Confirm finding the p-value during computer lab session. To do this, use
NORM.S.DIST(….,….). Is the p-value larger or smaller than the significance level?
What does it mean in relation to the test?
Comparison with significance level:

____________________________________________________________________
____________________________________________________________________
Conclusion:
____________________________________________________________________
____________________________________________________________________

5. Hoping to lure more shoppers downtown, a regional city council builds a new public parking
garage close to the main shopping street. The city plans to pay for the building through parking
fees. The consultant who advised the city on this project predicted that parking revenues would
average $970 a day. For a random sample of 30 weekdays, the daily fees collected averaged
$964. Assuming the population standard deviation is $20.
(a) Perform a hypothesis test to determine whether there is evidence at the 5% level of
significance to conclude that the average revenue is lower than the consultant predicted.
HINT: Apply and learn the following systematic approach by circling what is appropriate
and find missing words.
• This is statistical inference related to estimation/hypothesis testing about the population
mean/proportion because...............................................................
• Standard deviation required for this calculation is population/sample standard
deviation because………………….
• I will use the following formulae for test statistic ………………… with the following
distribution ……………, because………………………………
• Do I know all components what I need to substitute into this formula? Y/N
• I suppose to use critical value/p-value approach because…………….
• Follow recommended 5 step procedure
(b) If the test were done at the 10% level of significance, would the answer change?
(c) In the context of this question, describe what is a Type 1 error.

(d) In the context of this question, describe what is a Type 2 error.
6. During Enterprise Bargaining negotiations between management and unions, the

management negotiator argues that the company’s building workers who are paid an
average of $50,000 per year, are well paid because the mean annual income of all building
workers in the country is less than $50,000. To test this belief, an arbitrator draws a random
sample of 250 building workers from across the country and asks each to report his or her
annual income. The mean of the 250 reports is $48,530. Assuming that the incomes of all
building workers in the country have a standard deviation of $13,000,
(a) can it be inferred at the 5% significance level that the management negotiator is correct?
Use critical value approach.
(Note: In this question we are asking whether there is strong evidence that the
negotiator’s claim is correct, i.e. strong evidence that the national mean income for
building workers is less than $50,000.)
(b) can it be inferred at the 5% level that the mean income of building workers across the
country is different from $50,000? Use critical value approach.

7. (Optional)
The approval process for a life insurance policy requires a review of the application and
the applicant’s medical history, possible requests for additional medical information and
medical examinations, and a policy compilation stage where the policy pages are generated
and then delivered. The ability to deliver policies to customers in a timely manner is
critical to the profitability of this service. During one month, a random sample of 25
approved policies is selected and the total processing time in days is recorded. The sample
mean is found to be 34.64. The population standard deviation is 27.00.
(a) In the past, the mean processing time averaged 45 days. At the 5% level of significance,
is there evidence that the mean processing time has changed from 45 days? Use the p-
value approach.
(b) What assumption about the population distribution is needed in (a)?
(d) In the context of this question, describe what is a Type 2 error.
8. At a large furniture and electrical store customers usually find that the furniture on display is
not held in stock. Rather than being immediately available, it must be sourced from
manufacturers. In the sofa department, the average delivery time is expected to be six
weeks after purchase, and it is believed that delivery times are normally distributed around
this value. In order to test whether the six-week target is accurate, the store recorded the
delivery time (in days) taken for 50 sofa purchases and calculated the sample mean
delivery time was 43.68 days with the sample standard deviation 27.078 days.
If there is strong evidence that the mean delivery time is greater than 42 days,
the store may
• Advise customers of a longer waiting period, potentially driving away customers
who are not prepared to wait that long
• Negotiate with suppliers to investigate the possibility of more rapid service.
Both of these may be a significant cost to the store, so will only be undertaken if
the evidence is strong.
(a) Suggest appropriate null and alternative hypotheses, and explain your choice
of null and alternative hypotheses.
(b) State the values of the following:
Sample Mean: x = ________________

Standard deviation: s or σ =_________________!!!!!!!
(must be able to recognise for correct choice of the test statistic)
Sample size: n = ___________
Null hypothesis value of the mean: µ 0 = __________________
(c) State the test statistic. Specify distribution of this test statistic.
(d) Perform the test at the 5% level of significance by the critical value method.
Using the critical value approach. Use recommended 5 steps procedure (refer
to Lecture week 6)

Step 1: Null and alternative hypotheses (from (a) above)
Step 2: Compute the test statistic
Step 3: Determine critical value or p-value and identify rejection region
Critical value cuts off an upper tail of area 0.05 in the t distribution with 49 degrees of
freedom.
t crit = T .INV (0.95,49) = 1.677 (statistical table will give you closest 1.676)
Using Excel (in lab session) for critical value you will need to use T.INV. Recall that
T.INV(0.05,n-1) will yield a negative number that cuts off a lower tail, and so the
number you want is the corresponding positive number: Use either –T.INV(0.05, n-1)
or T.INV(0.95,n-1)
Step 4: State decision rule and make decision
Step 5: State the conclusion within the context of the problem
(e) The p-value associated with the value of the test statistic obtained from the
sample is 0.33. (Confirm finding the p-value during computer lab session.
To do this, use T.DIST.RT which requires you to provide the sample test
statistic value from Step 5, and the degrees of freedom.) Is the p-value larger
or smaller than the significance level? What does it mean in relation to the test?
Comparison with significance level:

____________________________________________________
____________________________________________________
Conclusion:
____________________________________________________________________
____________________________________________________________________
9. (Optional) Note: This question differs from question 7. (What is different?)
The approval process for a life insurance policy requires a review of the application and the
applicant’s medical history, possible requests for additional medical information and medical
examinations, and a policy compilation stage where the policy pages are generated and then
delivered. The ability to deliver policies to customers in a timely manner is critical to the
profitability of this service. During one month, a random sample of 25 approved policies is
selected and the total processing time in days is recorded. The sample mean is found to be 34.64
and the sample standard deviation is 26.00.
(a) In the past, the mean processing time averaged 45 days. At the 5% level of
significance, is there evidence that the mean processing time has changed from 45
days?
(b) What assumption about the population distribution is needed in (a)?

3rd and 4th edition
Fundamentals of Hypothesis testing :One sample tests (Chapter 9)

p.284 : 9.18, pp.291 - 292 : 9.22, 9.24, 9.26, 9.28, 9.30, 9.32, p.296: 9.44
pp.301 - 302: 9.46, 9.50, 9.54, 9.60 [Steel available in Chapter 9 data files]
Computing Lab: Week 7
MyMathLab STUDY PLAN – practice topic relevant questions followed by the quiz.
This is useful for your EXAM
Exercise 1 Calculating p-values using Excel

Use NORM.S.DIST to calculate the p-value for the following tests. Compare Excel values
with table values (question 1):
(i) H0 : µ = 500
H1 : µ ≠ 500
z = −1.76
(ii) H0 : µ ≤ 200
H1 : µ > 200
z = 2.63
Exercise 2 Hypothesis testing, σ known

(a) The duty manager at a dairy is responsible for deciding when to order maintenance
checks for the milk carton filling machines. In order to make this decision for a given
machine, he checks the fill level on 20 cartons of milk which nominally contain 1 litre,
although actually the dairy aims for an average fill of 1050ml. From long experience it
is known that the volumes of milk dispensed into the cartons have a constant standard
deviation of 10ml. However, between maintenance checks, the average amount
dispensed can drift higher or lower than the intended average of 1050ml. For a check
on one particular machine, the fill levels in ml for the 20 cartons are listed in
Exercise2.xls. Based on this sample, use the critical value method to determine whether
there is sufficient evidence to conclude that the fill level for this machine is
significantly different from 1050ml at the 5% level.
(b) Calculate the p-value for the test in (a).

Exercise 3
Use T.DIST and T.INV as appropriate.

Calculate the test statistic and the p-value when:
= =
x 145, =
s 50, n 100
H0 : µ ≥ 150
H1 : µ < 150
Discuss the conclusion to be drawn from this hypothesis test.
Exercise 4
The following data were drawn from a normal population. Can we conclude at the 5%
significance level that the population mean is not equal to 32?
25 18 29 33 17
Find the p-value for this test.
Exercise 5
Ecologists have long advocated recycling newspapers as a way of saving trees and
reducing landfills. In recent years, a number of companies have gone into the business of
collecting used newspapers from households and recycling them. A financial analyst for
one such company has recently calculated that the firm would make a profit if the mean
weekly newspaper collection from each household exceeded 1 kg. In a study to determine
the feasibility of a recycling plant, a random sample of the weights of recycled newspapers
from 100 households was obtained. Find the relevant data in Exercise5.xls
Do these data provide sufficient evidence at the 1% significance level to allow the
analyst to conclude that a recycling plant would be profitable?
Find the p-value for this test.

Exercise 1
(a) Note that the form of the alterative hypothesis indicates that this is a two-sided test.
The p-value for the 2-sided test is the area below – 1.76 plus the area above 1.76:
-1.76 0 1.76

The area below – 1.76 can be obtained as NORM.S.DIST(-1.76, true)
The required p-value is double this:

2 × NORM .S .DIST ( −1.76, true) =
2 × 0.0392 =
0.0784 .
(b)
This hypothesis test is one sided and the p-value is the area of the upper tail. This can be
obtained as
1 − NORM .S .DIST (2.63, true) =
1 − 0.99573 =
0.00427
Alternatively, use the symmetry of the normal distribution which means that the lower tail
cut off by -2.63 is equal to the upper tail cut off by 2.63, so obtain directly:
= NORM .S .DIST ( −2.63, true
= ) 0.00427
Exercise 2
0.05 > 0.0518 CANNOT reject H 0
Notice Step 5: Conclusion for both parts (a) and (b) – we CANNOT reject H 0 at 5% level
of significance. The sample DOES NOT provide enough evidence against H 0 . Therefore the
fill level for this machine is NOT significantly different from 1050ml.

Exercise 3
H 0 : µ ≥ 150
H 1 : µ < 150
xbar 145
s 50
n 100
H 0 : mean 150
t -1.000
p-value 0.16
The null hypothesis would not be

rejected at any significance level below 16%.
Exercise 4
Data
25
18
29
33
17
H 0 : µ = 32
H 1 : µ ≠ 32
Significance level 5%
xbar 24.4
s 6.913754
t -2.45802
tcrit 2.776445 and -2.77645
p-value 0.06984
Exercise 5
xbar 1.0925
s 0.330073
n 100
Test statistic t 2.80241
critical value
T.INV(0.99,99)
or –T.INV(0.01,99) 2.364606
p-value
T.DIST.RT(2.802,99) 0.003051

Week 8 Tutorial: Inference for the proportion in a single
population – estimation and hypothesis testing.
You will have Formative Assessment Task (FAT I) during Computer Lab period.
Students revise Weeks 4 – 7 lecture material & respective tutorials
REMEMBER: Practice with “MyMathLab STUDY PLAN” for Final Exam
Part A:
1. With the recent interest in the proportion of Australians who use recreational drugs,
suppose a survey is conducted on 10,000 randomly chosen Australians aged 15 years or older.
It is found that 1,600 of the participants in the survey currently use recreational drugs.
Obtain a 95% confidence interval for the proportion of all Australians aged 15 years and older
who currently use recreational drugs.
State the formula, show all working and remember to always interpret your interval in
the context of the question.
2. With the imminent new cigarette packaging legislation, there is a lot of interest at the
moment in the proportion of Australians who smoke. If we have an estimate of the proportion
who smoke now, then we will have a benchmark against which to judge any change that could
be attributed to the legislation. Suppose a survey is conducted on the smoking habits of 5,000
randomly chosen Australians aged 15 years or older. It is found that 784 of the participants in
the survey currently smoke.
Obtain a 95% confidence interval for the proportion of Australians aged over 15 who currently
smoke.
• I will use the following formulae for calculating 95% confidence
interval ………………………………,because………………………………
• Find the 95% confidence interval estimate (include units)
• Interpret (include units)

3. Question 2 continues.
Test whether there is evidence at the 5% level of significance that the percentage of Australians
aged 15 or over who smoke is greater than 15%. Use the critical value approach. Is there
evidence at the 10% level? What is the p-value for this test? Interpret the p-value.
• I will use the following formulae for test statistic ………………… with the following
distribution ……………, because………………………………
• I suppose to use critical value/p-value approach because…………….
• Follow recommended 5 step procedure
Part B:
4. Many public polling agencies conduct surveys to determine the current consumer
sentiment concerning the state of economy. Suppose that one agency randomly samples 484
consumers and finds that 257 are optimistic about the state of economy.
(a) Use 90% confidence interval to estimate the proportion of all consumers who are
optimistic about the state of economy.
Answer the following questions:
• Point estimator of the population proportion π is …….. and equals to ……..
• The calculation is based on formula ………………………………………
because …………………………………..
• The lower confidence limit is …………………… and I can calculate it
as ……………………………
• The upper confidence limit is ……………………. And I can calculate it
as ……………………………
• I am …….% confident that the population proportion (specify what it is
within the context of this question) is somewhere between ………..
and ………….
(b) Based on the confidence interval, can we infer that the majority of all consumers are
optimistic about the economy?

5. A study by a research consultant showed that 79% of companies offer flexible
scheduling. Suppose a researcher believes that in accounting companies this figure is lower.
The researcher randomly selects 415 accounting companies, and through interviews determines
that 303 of these companies have flexible scheduling.
(a) Using 1% level of significance, test if there is enough evidence to conclude that a
significantly lower proportion of accounting companies offer employees flexible
scheduling. Use p-value approach.
(b) In the context of this question, describe what is a Type 1 error.
6. The reputation and hence sales of many business can be severely damaged by
shipments of manufactured items that contain a large percentage of defectives. A
manufacturer of alkaline batteries wants to be reasonably certain that less than 5% of its
batteries are defective. Suppose 300 batteries are randomly selected from a very large shipment;
each is tested and 10 defective batteries are found.
(a) Does this outcome provide sufficient evidence for the manufacturer to conclude that
the fraction defective in the entire shipment is less than 0.05? Use α = 0.01 .
(b) Find the p-value for the test in part (a). How strong was the weight of evidence
favouring alternative hypothesis in part (a)?
7. In week 4 tutorial you have become familiar with Pivot Table.
(a) Recall Elecmart.xlsx sample data, Pivot table of Spent vs Gender – focus on count
Obtain a 98% confidence interval for the proportion of all female.

(b) Can we conclude at 2% level of significance that proportion of all female is
different to 60%? Use confidence interval from part (a) along with
hypothesis testing, use p-value approach.
(c) Recall Elecmart.xlsx sample data, Pivot table of Spent vs Gender and Time – focus
on count
Count of Spent Column Labels

Row Labels Afternoon Evening Morning Grand Total
Female 113 36 85 234
Male 41 86 39 166
Grand Total 154 122 124 400
Can we conclude at 5% level of significance that proportion of all female in

two leading time shifts is larger than 45%? Use p-value approach.

8. Eighteen per cent of multinational companies provide an allowance for personal long-
distance calls for executives living overseas. Suppose a researcher thinks that multinational
companies are having a more difficult time recruiting executives to live overseas and that an
increasing number of companies are providing an allowance for personal long-distance calls to
these executives to ease the burden of living away from home.
To test this claim, a new study is conducted by randomly sampling 376 multinational
companies. Twenty two per cent of these surveyed companies are providing an allowance for
personal long-distance calls to executives living overseas.
Using this information and the 1% level of significance, test this claim using the critical value
approach. Ensure that you clearly state your hypotheses, show ALL steps, ALL your working
AND interpret your conclusion in context of this question.
Confidence interval estimation (Chapter 8) and Fundamentals of Hypothesis testing: One

sample tests (Chapter 9)
3rd edition
p.256-257: 8.24, 8.30
p.306: 9.62, 9.64, 9.66, 9.68
4th edition
p.257-258: 8.24, 8.30
p.306: 9.62, 9.64, 9.66, 9.68

You will have Formative Assessment Task (FAT I) during this period.
Students revise weeks 4 – 7 lecture material & respective tutorials
Instructions will be given by your Teacher
Exercise 1 Calculating p-values and critical values using Excel

Use NORM.S.INV or NORM.S.DIST to calculate the p-value or critical values for the
following tests. Discuss the conclusion to be drawn from this hypothesis test.
(a)
H 0 : π = 0.15
H1 : π ≠ 0.15
α = 5%, zcalc = −1.79
(b)
H 0 : π ≥ 0.79
H1 : π < 0.79
α = 1%, zcalc = −3.001
Exercise 2 A random sample of 50 consumers taste-tasted a new snack food. Their responses
were coded (0: do not like; 1: like; 2: indifferent) and recorded as follows:
1 0 0 1 2 0 1 1 0 0
0 1 0 2 0 2 2 0 0 1
1 0 0 0 0 1 0 2 0 0
0 1 0 0 1 0 0 1 0 1
0 2 0 0 1 1 0 0 0 1
(a) Use an 80% confidence interval to estimate the proportion of consumers who like the
new snack food.
(b) Based on your finding in part (a) can we infer that majority of customers will like the
new snack food?
Exercise 3 Refer to computing Exercise 2, in which 50 consumers taste-tested a new snack

food. Test H 0 : π ≤ 0.5 against H1 : π > 0.5 . Use α = 10% . Compare your finding with
computing Exercise 2 part (b).
Exercise 4..Refer to PlanFinan.xlsx data (tutorial week 4) again. Create Pivot Table of Salary
vs Sex and EducLevel – focus on Count
(a) Obtain a 95% confidence interval for the proportion of all male within
postgraduate educational level group.
(b) Can we conclude at 5% level of significance that proportion of all male within
postgraduate educational level group is smaller than 57%?

Exercise 1
(a)
p − value = 2 × P ( Z < −1.79 ) zcrit = ± z0.025
2 × NORM .S .DIST ( −1.79,1)
= =
NORM .S .INV (0.025) =
−1.95996
= 0.073454 = NORM = .S .INV (0.975) 1.95996
DR : Reject H 0 if p − value < α DR : Reject H 0 if zcalc < − zcrit or zcalc > zcrit
0.073454 > 0.05 − 1.79 > −1.95996 or − 1.79 < 1.95996
Using p-value and critical value approach we get the same conclusion: we CANNOT
reject the null hypothesis. The sample DOES NOT provide enough evidence against H 0
at 5% level of significance. Therefore, the population proportion IS NOT
SIGNIFICANTLY different to 15%.
(b)
p − value = P ( Z < −3.001) zcrit = − z0.01
NORM .S .DIST ( −3.001,1)
= =
NORM .S . INV (0.01) =
−2.32635
= 0.001345
DR : Reject H 0 if p − value < α DR : Reject H 0 if zcalc < zcrit
0.001345 < 0.01 − 3.001 < −2.32635
Using p-value and critical value approach we get the same conclusion: we CAN reject the
null hypothesis. The sample DOES provide enough evidence against H 0 at 1% level of
significance. Therefore, the population proportion IS SIGNIFICANTLY smaller than
0.79.
Exercise 2 (a) Sample proportion of customers who like the new snack is 0.3. Use
NORM.S.INV(0.1) or NORM.S.INV(0.9) for finding the critical value ± 1.2815. Hence 80%
confidence interval limits are: upper 0.38 and lower 0.22.
(b) based on finding that both confidence interval limits are lower than majority
proportion (50% or 0.5) ; we cannot say at 80% confidence level that majority of customers
will like the new snack.
Exercise 3
z = −2.8284 , fail to reject H 0.
This is the same finding as in computing exercise 2 (b).

Exercise 4
Required pivot table for answering part (a) and (b)
Count of
Salary Column Labels
Grand
Row Labels 1 2 3 4 Total
0 9 46 53 46 154
1 11 39 36 34 120
Grand Total 20 85 89 80 274
(a)
46
n=
80, p =≈ 0.58, z α =
z0.025 =
1.96
80 2
0.58(1 − 0.58) 0.58(1 − 0.58)

0.58 − 1.96 < π < 0.58 + 1.96
80 80
0.4718 < π < 0.688
47.18% < π < 68.80%

(b)
46
=
n 80,=
p ≈ 0.58
80
H 0 : π ≥ 0.47
H1 : π < 0.47
Note that 47% of 80 is greater than 5 and also 53% of 80 is greater than 5,
and hence the normal approximation is valid.
p − 0.47
Test statistic: z = is distributed approximately as N(0,1).
0.47(1 − 0.47)
80
0.58 − 0.47
=zcalc ≈ 1.971
0.47(1 − 0.47)
80
p-value =P ( Z < 1.971) =0.9756
DR: Reject the null hypothesis if p-value < α

Since 0.9756 > 0.05 we CANNOT reject the null hypothesis. The sample
DOES NOT provide enough evidence against H 0 at 5% level of
significance. Therefore, the proportion of all male within postgraduate
educational level group IS NOT SIGNIFICANTLY smaller than 47%.

Week 9 Tutorial: Correlation and regression between
quantitative variables.
Practice with “MyMathLab STUDY PLAN” for Final Exam
Part A:
1. A regression analysis output from Excel on the SALES and PRICE for franchises of a
(unnamed) burger chain in a selection of different cities across the US is provided bellow.
SALES is in thousands of dollars, while PRICE is an index over all products sold in a given
month, and is expressed as a notional number of $ for a meal. Notice that you will practice how
to create this Excel output along with scatter plot in computing lab session.
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.625541
R Square 0.391301
Adjusted R
Square 0.382963
Standard Error 5.096858
Observations 75
ANOVA
df SS MS F Significance F
Regression 1 1219.091 1219.091 46.9279 1.97E-09
Residual 73 1896.391 25.97796
Total 74 3115.482
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95% Lower 99.0% Upper 99.0%
Intercept 121.9002 6.526291 18.67832 1.59E-29 108.8933 134.9071 104.639 139.1614
PRICE -7.82907 1.142865 -6.85039 1.97E-09 -10.1068 -5.55135 -10.8518 -4.80635
Using the regression output answer the following questions.

a) Find the sample value of the correlation coefficient between PRICE and SALES, and
interpret this value.
b) Find the coefficient of determination and interpret this value.
c) Find and use your estimated regression equation, predict the sales a franchise could
expect if the cost of a meal was set to $5.50.
d) Is this prediction likely to be reasonable/valid? Explain briefly. (Observed range was
$4.83 to $ 6.46)
e) State and interpret the 95% and 99% confidence intervals for the slope coefficient.
Compare and comment on the width of these intervals. Make comments about precision
of estimates.

2. This question uses real data on manatee deaths due to powerboat accidents in Florida
between 1977 and 1990. Data are available in Excel file Manatee.xlsx and in the table at the
end of this question.
a) What do you expect the relationship to be between manatee deaths and the number of
registered powerboats?
b) Use Excel to plot a scatter diagram (You will practice how to create this scatterplot in
lab session). Comment on how this visual relationship compares with your
expectations.
Manatee deaths
60
Numer of manatee deaths
50
40
30
20
10
0
400 450 500 550 600 650 700 750
Number of registered powerboats ('000s)
c) Find the linear model for estimating the Number of manatee deaths from the number
of registered powerboats (‘000s). Use the Excel summary output below. (You will
practice how to create this output in lab session).
SUMMARY OUTPUT
Multiple R 0.941477289
R Square 0.886379485
Adjusted R Square 0.876911109
Observations 14
ANOVA
Regression 1 1711.979 1711.979 93.61473 5.11E-07
Residual 12 219.4499 18.28749
Total 13 1931.429
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept -41.43043895 7.412217 -5.58948 0.000118 -57.5803 -25.2806
Powerboats
(thousands) 0.124861692 0.012905 9.675471 5.11E-07 0.096744 0.152979

3. (Q2 continued)
d) In 1998 there were 914,535 powerboats registered in Florida. (I have been unable to
find the number of registrations for other years since 1990.) Assuming the model has
been tested and found to be significant. According to the model, how many deaths
attributable to powerboats would you expect in 1998?
Power boats Power boats

Year (thousands) Manatee Deaths Year (thousands) Manatee Deaths
1977 447 13 1984 559 34
1978 460 21 1985 585 33
1979 481 24 1986 614 33
1980 498 16 1987 645 39
1981 513 24 1988 675 43
1982 512 20 1989 711 50
1983 526 15 1990 719 47
Part B:
4. Assuming that relationship is statistically significant, use the information from the
scatterplot below to estimate the Final mark (maximum 100 mark) if the study time
(a) is 0 hours
(i) comment on validity (reliability) of this estimate
(ii) explain why this estimate is different to observed value of Final mark for
study time 0 hours.
(b) is 50 hours
(i) comment on validity (reliability) of this estimate
(ii) explain why this estimate is not encouraging for a student who wants to
get the perfect Final mark
(iii) by visual inspection of the scatter plot, state with explanation the range
of study time leading to achieving the perfect Final mark. Explain why study
time 45 hours is outside this study time range?
120
100 y = 1.8773x + 21.59

R2 = 0.7941
80
Final mark
60
40
20
0
0 5 10 15 20 25 30 35 40 45 50
Study time (hours)

5. In a context of question 4, circle what is appropriate and find missing words in the following
report:
Dependent variable is ………………………….. (along with the units) because…………………

Independent variable is………………………….. (along with the units) because…………………
The scatterplot of ………………………… against ……………………………. shows a clear
pattern such as the points seem to drift upwards/downwards as I move across the plot. I can
say that there is a negative/positive relationship between these two variables. The points in this
scatterplot can be thought of as random fluctuation around a straight …………….. , hence these
two variables are linearly/non-linearly related and this relationship is
deterministic/probabilistic. The strength of this linear positive/negative relationship is rather
strong/not so strong, because the points in this scatterplot fit close/not so close to a straight
line. The value of the coefficient of correlation is ………, which I calculate as ……….. This
value is close to ………., which confirms that the linear relationship between
…………………….. and …………………………… is ………………. and ……………….
6. In a context of question 4, circle what is appropriate and find missing words in the following
report:
(a) Find the least squares regression line (in terms of variables) and interpret
The least square regression line which relates …………………….. to …………………. is

…………………………………. It describes a linear ……………. for estimating the value of
……………………………… for a nominated value of……………………..
(b) Find the slope of the least square regression line and interpret
The slope is ………………. The slope of regression line predicts that, on average,
……………………increases/decreases by ……………..(along with units) for one unit
increase/decrease in ……………………..
(c) Find the y-intercept of the least square regression line and interpret
The y-intercept is………………. The y-intercept of regression line predicts that, on average, the
……………………when no ……………………… is …………..(along with units). In the context of
this question it is a valid (reliable)/invalid (not reliable) estimate because …………………
(d) Find the coefficient of determination and interpret
The coefficient of determination is ……….It indicates that, on average, …………….% of

variation in ……………………… is explained by the variation in ……………………….. Only
…………% of the variation in ………………………. has been let unexplained/explained.
OR
On average, approximately ……………% of the variation in the ……………is explained by

the regression of study time on final mark. The remaining ……………% of variation in
the final mark is left unexplained by the model. This suggests a …………….

7. A used car dealership is considering the factors that determine the sale price of used Toyota
Camry passenger vehicles. As a first attempt at predicting the price, it is assumed that the main
factor affecting the resale value is the distance the car has travelled (i.e. the odometer reading).
For a sample of 13 cars, answer the following questions:
a) What do you expect the relationship to be between Price and Odometer Reading?
b) Use Excel scatter diagram against Odometer Reading vs Price (be sure you will be able
to use Excel for getting this scatterplot).
Comment on how this visual relationship compares with your expectations.
Price of Used Camry

40000
35000
30000
25000
Price ($)
20000
15000
10000
5000
0
0 50000 100000 150000 200000 250000
Odometer Reading (km)
c) Based on the scatter plot, comment on whether it is appropriate to fit a regression line
to the data.
A regression analysis was performed using Excel, with the following result:
SUMMARY OUTPUT
Multiple R 0.441278
R Square 0.194727
Adjusted R
Square 0.12152
Observations 13
ANOVA
Regression 1 1.4E+08 1.4E+08 2.659956 0.131176221
Residual 11 5.78E+08 52553716
Total 12 7.18E+08
Standard Upper Lower Upper

Coefficients Error t Stat P-value Lower 95% 95% 95.0% 95.0%
Intercept 28876.87 4516.154 6.394129 5.12E-05 18936.88211 38816.86 18936.88 38816.86
Odometer
Reading -0.05106 0.03131 -1.63094 0.131176 -0.11997598 0.017848 -0.11998 0.017848

d) State the estimated linear regression equation for this data using the X, Y labels and
using the variable names instead of the X, Y labels
e) Conduct a hypothesis test to determine whether there is evidence at the 5% level of
significance that a linear relationship exists between PRICE and ODOMETER
READING. (Remember, only the p-value approach is used in regression analysis).
f) What is the slope of the estimated regression line? Provide an interpretation of this
value.
g) What is the value of the intercept of the regression line? Give an interpretation of this
value and discuss whether it is meaningful in this case.
h) Look again at the output. The first number at the top of the output is labelled “Multiple
R”. This number is the absolute value of the sample correlation coefficient r between
the two variables. In order to find out the sign of the correlation coefficient, you must
look at the sign of the slope. So, what is the value of the coefficient of correlation
between odometer reading and price? Based on this value, how would you describe the
strength of the relationship between price and odometer readings for the Toyota
Camry?
i) Find the coefficient of determination and interpret this value.

Simple Linear Regression (Chapter 12)
3rd and 4th edition

In all questions, use Excel to get regression summary output and answer stated questions
Data file Pages 420 – 421 Pages 427 – 428 Pages 443 – 443
CO2.xlsx 12.4 12.15(a), (c) 12.37
Class Size.xlsx 12.7 12.18(a), (c) 12.40
3rd edition
p. 454: 12.67 (a) – (d), (f); p. 454: 12.68 (a) – (d), (f), (i) (data file crude.xlsx)
4th edition
p. 454: 12.70 (a) – (d), (f); p. 454: 12.61 (a) – (d), (f), (i) (data file crude.xlsx)

MyMathLab STUDY PLAN – practice topic relevant questions followed by the quiz.
This is useful for your EXAM
See Section E5.3 in the Excel Notes on Moodle for instructions about how to
generate a simple linear regression output.

Exercise 1
This question uses data in the worksheet labeled Burger, in the file Data File for Burgers.xlsx
It gives real data on the SALES and PRICE for franchises of a (unnamed) burger chain in a
selection of different cities across the US. SALES is in thousands of dollars and is the
dependent variable (Y), while PRICE is an index over all products sold in a given month and
is expressed as a notional number of dollars for a meal, and is the independent variable (X).
(a) What do you expect the relationship to be between PRICE and SALES?
(b) Use Excel to plot a scatter diagram [Insert > Scatter and select unjoined dots] of
SALES(Y) against PRICE(X), (with SALES on the vertical).
Comment on how this visual relationship compares with your expectations.
(c) Estimate a model for the relationship between SALES and PRICE, by using Excel to
produce the simple linear regression output. Remember to follow the approach that you
practiced in your Pre-Class Exercises on the ice cream sales data.
Include the following extra feature:
In addition to checking “Labels”, also check “Confidence level” and in the adjacent field, type
“99”
(This will provide a 99% confidence interval for population coefficients in addition to the 95%
confidence interval that is always provided.)
As a check that your output is correct, make sure that that the fourth number from the top of
the output, Standard Error, is 5.09686.
(d) State the estimated linear regression equation for this data
Using the X, Y labels
Using the variable names instead of the X, Y labels
(e) Conduct a hypothesis test to determine whether there is evidence at the 5% level of
significance that a linear relationship exists between PRICE and SALES. (Remember,
only the p-value approach is used in regression analysis, apply 5 steps hypothesis testing
approach)).
(f) What is the slope of the estimated regression line? Provide an interpretation of this value.
(g) What is the value of the intercept of the regression line? Give an interpretation of this
value and discuss whether it is meaningful in this case.
Exercise 2
A government economist is attempting to produce a better measure of poverty than is currently
in use. To help acquire information, she recorded the annual household income in $000s and
the amount of money spent on food during one week for a random sample of households. Data
is available in Excel file Exercises2-4.xlsx in worksheet Exercise 2.
(a) Use Excel to produce a scatter plot of the data. Comment on whether linear regression
will supply a suitable model of the relationship.
(b) Obtain a regression output for this data, and state the equation of the regression line.
(c) Make an economic interpretation of the slope.
(d) What does the value of the intercept tell you?
(e) Estimate the weekly expenditure on food if the annual household income is:
(i) $60,000
(ii) $150,000
Comment on these estimates.

(f) Formulate a hypothesis test to determine whether there is a significant linear relationship
between food expenditure and income. At what levels of significance would the null
hypothesis be rejected?
Exercise 3
The growing interest in and use of the internet has forced many companies to consider ways to
sell their products on the web. Therefore such companies are interested in determining who is
using the web. A statistics practitioner undertook a study to determine how education and
internet use are connected. He took a random sample of 200 adults (20 years of age and older)
and asked each to report the years of education they had completed the number of hours of
internet use in the previous week.
Data is available in Excel file Exercises2-4.xlsx in worksheet Exercise 3.
(a) Use Excel to produce a scatter plot of the data. Comment on whether linear regression
will supply a suitable model of the relationship.
(b) Obtain a regression output for this data, and state the equation of the regression line.
(c) From the output, write down the equation of the regression line.
(d) Interpret the slope.
(e) Interpret the coefficient of determination.
(f) Can we conclude that a significant linear relationship exists between years of
education and hours of internet use? At what significance level?
Exercise 4
In order to determine a realistic price for a new product that a company wishes to market, the
company’s research department selected 10 sites thought to have essentially identical sales
potential and offered the product in each at a different price. The resulting sales are recorded in
the following table and also in in Excel file Exercises2-4.xlsx in worksheet Exercise 4.
Location Price(x) Sales ($000s) (y)

1 15.00 15
2 15.50 14
3 16.00 16
4 16.50 9
5 17.00 12
6 17.50 10
7 18.00 8
8 18.50 9
9 19.00 6
10 19.50 5
(a) Use Excel to find the graph of the scatter plot, the regression output, and the graph of
the regression line.
(b) From the output write down the equation of the regression line.
(c) Interpret the slope.
(d) Interpret the coefficient of determination.
(e) Is there sufficient evidence at the 0.5% significance level to allow us to conclude that
significant linear relationship exists between price and sales?
Exercise 5
Reproduce scatterplot and Excel Summary Table for question 2. Data are available in Excel
file Manatee.xlsx

Exercise 1
(a) I would expect PRICE and SALES to be negatively correlated, since raising the price
would tend to lower the amount purchased.
(b) The scatter graph of SALES vs PRICE has the data points clustered around a line.
The data points spread from the bottom right corner up to the top left corner which
indicates a negative slope. This indicates that sales decrease as price increases.
Scatterplot of Sales vs Price

100
90
Sales ('000$)
80
70
60
50
4.5 5 5.5 6 6.5 7
Price ($)
(c) Find the simple linear regression bellow
(d)
yˆ = 121.900 − 7.829 x where x is the price in dollars and 𝒚𝒚

� is the estimated sales in
thousands of dollars for that price level.
OR Estimated=
Sales 121.900 − 7.829 Price

(e)
Step 1:
H 0 : β 1 =0
H 1 : β 1 ≠0
Step 2:
α = 0.05
Step 3:
p-value = 0.0000
Step 4:
Reject H 0 if p-value < α
Since the 1.97E-09 < 0.05, we CAN reject the null hypothesis,
Step 5:
We CAN reject H 0 at the 5% level of significance. The sample DOES provide enough
evidence against H 0 . That is, a significant linear relationship DOES exist between the
Sales and Price of Burgers.
(f) Slope = -7.829. For every $1 increase in Price, the Sales is estimated to decrease on
average by $7,829.
(g) Intercept = 121.9. If the price were zero then the sales level, on average, would be
$121,900. This could be thought of as a prediction of the sales level if the hamburgers were
being given away. However, this is not a valid prediction because the prices in the data set
range from $4.83 to $6.49, and therefore $0 is well outside the range of the data.
[Note that when interpreting the intercept and slope, it is important to take account of the
units in which the data is specified. In the current case, in particular, the sales level is in
thousands of dollars.]
Exercise 2:
(a) Linear model might work, however with very large variation amongst the data, the
strength of the possible linear relationship will be very low.
Weekly food expenditure vs annual household income
400
Weekly food expenditure ($)
350
300
250
200
150
20 30 40 50 60 70 80 90 100
Annual income ($000's)

(b)
SUMMARY OUTPUT
Multiple R 0.4958
R Square 0.2459
Adjusted R
Square 0.24077
Observations 150
ANOVA
Regression 1 65841.5803 65841.5803 48.2528 1.1047E-10
Residual 148 201948.016 1364.5136
Total 149 267789.5963
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept 153.8986 17.0199 9.0423 7.944E-16 120.2651 187.5321
Income 1.9582 0.2819 6.9464 1.1047E-10 1.4011 2.51528
=
Equation of regression line: Yˆ 153.90 + 1.96 X
Where, X is the annual income (in $000's) and Yˆ is the estimated food expenditure.
(c) Economic interpretation of slope: On average, for every extra $1000 of annual income,
expect weekly food expenditure to rise by $1.96.
(d) Value of intercept: The intercept suggests that a household with no income will spend
$153.90 per week on food. However, zero income is a long way from the values in the data
set, so this estimate is not likely to be reliable. (any other plausible economic explanations?)
(e) Weekly expenditure on food if
(i) Income = $60,000. Weekly expenditure estimated to be
153.90+60*1.96 = $271.50 (or $271.39 if you use the unrounded values straight from the
output.).
Since this income level is within the range of the data, the estimate is regarded as reliable.
(ii) Income = $150,000. Weekly expenditure estimated to be
153.90+150*1.96 = $447.9 (or $447.63 if you use the unrounded values straight from the
output.) Since this income level is outside the range of the data, the estimate is not
reliable.
(f) Hypothesis test:
H 0 : β1 =0
H1 : β1 ≠ 0 where β 1 is the slope of the linear relationship.
Two-tail p-value = 1.1047E-10 (from the regression output). If α = 5%
Since the 1.1047E-10 < 0.05, we CAN reject the null hypothesis,
We CAN reject H 0 at the 5% level of significance. The sample DOES provide enough
evidence against H 0 . That is, a significant, linear relationship DOES exist between the
Weekly expenditure and Income.
The null hypothesis would be rejected at all significance levels greater than about 1.1047E-
10. (In other words at all reasonable significance levels it is virtually certain that there is a
linear relationship between these two variables.)

Exercise 3
(a)
Weekly hours of internet usage vs years of education
Number of hours of internet
18
16
14
usage last week
12
10
8
6
4
2
0
5 7 9 11 13 15 17 19
Years of education completed
Most of the data seems to be scattered around a line with positive slope, however at most years
of education completed, there are some data points for zero internet usage. This may represent
those who are not connected to the internet. So a linear regression model is worth trying, but
it will really only apply to those who have access to the internet. However, following the
instructions in the question, we continue the analysis using all the data.
(b)
SUMMARY OUTPUT
Multiple R 0.3308
R Square 0.1094
Adjusted R
Square 0.1050
Standard
Error 4.4539
Observations 200
ANOVA
df SS MS F Signi F
Regression 1 482.7345 482.7345 24.3345 0.00000171
Residual 198 3927.8205 19.8375
Total 199 4410.555
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%

Intercept -2.0343 1.7914 -1.1356 0.2575 -5.5670 1.4984
Education 0.7883 0.1598 4.9330 0.0000017 0.4732 1.1035
(c) Equation of regression line: Yˆ =

−2.0343 + 0.7883 X
where Yˆ is the estimated weekly hours of internet usage and X is the number of years of
education.
(d) Slope: 0.7883 For every extra year of education, on average a person can be expected to
use the internet for about 0.79 more hours a week.
(e) Coefficient of determination (R Square): 0.1094 That is, 10.94% of the variability in
internet usage is explained by variation in years of education. The remaining 89.06% is
unexplained by this model. This is a very poor fit.

(f) At any significance level above 0.0000017 (p-value), we can conclude that a significant
linear relationship exists between level of education and hours of internet use.
Exercise 4
(a)
Sales Versus Price
18
16
sales ($000)
14
12
10
8
6
4
2
0
14 15 16 17 18 19 20
Price
There is considerable variation in weekly food expenditure for a given level

of income. However, along with all this randomness, there does appear to be
a positive linear relationship between the variables. Thus linear regression
does appear to be a suitable way to obtain a model.
(b)
SUMMARY OUTPUT
Multiple R 0.9107
R Square 0.8294
Adjusted R
Square 0.8081
Standard
Error 1.6418
Observations 10
ANOVA
Significance
df SS MS F F
Regression 1 104.8364 104.8364 38.8938 0.0002
Residual 8 21.5636 2.6955
Total 9 126.4
Standard Upper
Coefficients Error t Stat P-value Lower 95% 95%
4.8814E-
Intercept 49.2909 6.2576 7.8770 05 34.8608 63.7210
x -2.2545 0.3615 -6.2365 0.00025 -3.0882 -1.4209

Sales versus price
y = -2.2545x + 49.291
18
R2 = 0.8294
16
14
Sales ($000's)
12
10
8
6
4
2
0
14 15 16 17 18 19 20
Price of product ($)
There exists a strong, linear and negative

relationship between price and sales.
Regression equation: Yˆ = 49.29 - 2.2545X

where Yˆ is the expected sales of the product (in $000's) and X is the price (in $).
(c) Interpretation of slope: For every extra dollar in price the sales will drop on
average by $2,254.50.
(d) Interpretation of coefficient of determination:

About 83% of the variability in sales is explained by variation in price. The
remaining 17% is unexplained by the model. This is a good fit
(e) The p-value for the test for a significant linear relationship is 0.00025. This value
is well below 0.5% (or 0.005) and therefore we we CAN reject the null hypothesis,
We CAN reject H 0 at the 0.5% level of significance. The sample DOES provide enough
evidence against H 0 . That is, a significant linear relationship DOES exist between the Sales
and Price.
Exercise 5 Scatterplot and regression summary output is included in question 2.

Week 10 Tutorial: Chi square test for independence between two
categorical variables; Components of time series – trend,
seasonal, cyclical and irregular.
You will have Formative Assessment Task (FAT II) during this period.
Students revise Weeks 8 & 9 lecture material & respective tutorials
REMEMBER: PRACTICE WITH “MyStudyPlan” for EXAM
Part A:
1.
(a) If a contingency table has 5 row categories and 6 column categories, how many degrees of
freedom are there for the χ test for independence?
2
(b) What is the critical value for the test of independence for the categories represented in the
table at the 1% level of significance?
(c) And at the 5% level of significance?
(d) If the χ value calculated for the test is greater than the critical value, what is your
2
conclusion?
2. Recall Elecmart.xlsx data, Pivot table of Spent vs Gender and Time – focus on count. Find
expected frequencies.
Count of Spent Column Labels

Row Labels Afternoon Evening Morning Grand Total
Female 113 36 85 234
Male 41 86 39 166
Grand Total 154 122 124 400
3.
Consider the following data in the contingency table (TABLE B). Conduct a test of independence
at the 5% level for the L and M categories using Table B.
OBSERVED FREQUENCIES EXPECTED FREQUENCIES
TABLE B TABLE B
M1 M2 Total M1 M2 Total
19 19
L1 15 4 L1
47 47
L2 28 19 L2
Total 43 23 66 Total 43 23 66
Show working and all steps in hypothesis testing.

Part B:
4. Consider the following data in the contingency table (TABLE A).
OBSERVED FREQUENCIES EXPECTED FREQUENCIES
TABLE A TABLE A
M1 M2 Total M1 M2 Total
38 38
L1 30 8 L1 24.76 13.24
94 94
L2 56 38 L2 61.24 32.76
Total 86 46 132 Total 86 46 132
(a) Explain (show working) how do you calculate expected frequencies.

(b) In a test for independence of the L and M categories, explain how do you get the number
of degrees of freedom of the χ distribution under the null hypothesis?
2
(c) Conduct a test of independence at the 5% level of significance for the L and M categories
using Table A.
Find missing values in the table below. Follow 5 steps procedure in hypothesis testing. Find missing
words or circle words appropriate in the context of interpretation.
( fo − f e )
2
Cell 𝑓𝑓0 𝑓𝑓𝑒𝑒 𝑓𝑓0 − 𝑓𝑓𝑒𝑒 (𝑓𝑓0 − 𝑓𝑓𝑒𝑒 )2

fe
L1M 1 30 24.76
L1M 2 8
L2 M 1 56
L2 M 2 32.76
Total 132 132
Check appropriateness of using χ :

2
Since every cell in the contingency table has expected/observed frequencies larger than ………, this
confirms that χ distribution is/is not appropriate.
2
Step 1: Hypotheses
H 0 : The L and M categories are .................

H 1 : The L and M categories are ......................
Step 2: Test statistic

(f o − f e ) 2
Test statistic: χ 2 = ∑
all cells fe
has a χ distribution with …… degree of freedom.
2

χ2calc =
...................
I am able to find this value………………………………(state where in the table
above)
Step 3: Critical value (Rejection Region)
α =.......... , Critical value = χ(.......,df

2
=......) =
................ (from tables)
Step 4: Decision Rule (Make the decision about H 0 )
Reject H 0 if……………
Since ………. > …….. we CAN/CANNOT reject H 0
Step 5: Conclusion
We can/cannot reject H 0 at …….% level of significance. The sample DOES/DOES NOT provide
enough evidence to show that variables ……………. and …………. are independent/dependent.
5. Correctly predicting the direction of change in foreign currency exchange can be lucrative.
216 investors were asked to predict the direction of change over a certain period, and the actual
direction was later recorded. The results are given in the following table.
Predicted direction:
Actual Down Up TOTAL
Direction: Down 65 64
Up 39 48
TOTAL 216

(a) Test the hypothesis that a relationship exists between the predicted and actual rates of
exchange. Use a 10% level of significance.
(b) Based on this data, would you say that the predictions of investors should be taken into
account in forecasting exchange rate changes?
TIME SERIES
6. In each of the following 4 time series plots explain the choice of additive/multiplicative
seasonality with trend/no trend. In the case of present trend, state whether it is linear/quadratic and
positive/negative. For each time series plot, suggest a relevant model for forecasting with correct
components.
Multiplicative model Yt = Tt × St × Ct × I t Additive model Yt = Tt + St + Ct + I t
Example: Exhibit with Additive Seasonality with No Trend
Additive – because the variance (the differences between the highest and lowest values) in Yt seems
to be constant with respect to the time
Seasonality – because there is an evident repetitive pattern such as that around every 12th time
period value of Yt is lowest and at around every 4th or 5th time period the value of Yt is highest
No trend – because overall the level of Yt fluctuates around constant value ( Yt does not increase
or decrease with respect to the time)
Suggested model for forecasting
Additive model in form Y=t St + I t , where St represents the seasonal component and I t irregular
(random) component

7. (Optional) In each of the following 4 time series plots state with explanation what components
are present. Use the following hint to formulate your statements:
Example: Exhibit with monthly housing sales

Trend component – there is/is not trend, because…………… In the case that there is trend, state
whether it is linear/quadratic, because…………… If trend is linear state whether it is
positive/negative, because……..
Seasonal component - there is/is not seasonal component, because…………… In the case that there
is seasonality, state what seasonality (monthly, quarterly, … etc.), because…………..
Cyclical component - there is/is not cyclical component, because……………
Irregular component - there is/is not irregular component, because……………
8. For the following graphs of time series, comment on what components appear to be present.
Is it possible to decide if the components should be combined additively or multiplicatively? State
the model what you suggest to use for forecasting this time series. State with an explanation, what
component is not present in the time series.

(a)
Plot of the Price of winter coats vs Time
250
Price of winter coats in Ruritania ($) 200
150
100
50
0
0 5 10 15 20 25 30
Time in quarters from 1998 to 2003
(b)
Unemployment
20
18
Unemployment (%) in Logosia
16
14
12
10
8
6
4
2
0
0 5 10 15 20 25 30
Time in quarters since beginning of 2001
(c)
GDP ($millions, current)
800000
700000
600000
500000
$millions
400000
300000
200000
100000
0
1988 1990 1992 1994 1996 1998 2000 2002 2004
t, years

3rd and 4th edition
Chi square test for independence (Chapter 15)

p. 583: 15.22, 15.24, 15.27
You will have Formative Assessment Task (FAT II) during this period.
Students revise weeks 8 & 9 lecture material & respective tutorials
For χ test of independence, if you need a critical value for a certain degrees of freedom, you need
2
to use CHISQ.INV.RT(probability, deg_freedom). Notice that probability refers to α . If you need

p-value, you need to use CHISQ.DIST.RT(x, deg_freedom), where x refers to the calculated value
of the test statistics.
For example, in lecture week 10, the cross-classification of the job status (having or not having a
job) with the exam status (HD or not HD) with degrees of freedom (2-1)(2-1) = 1 [for two columns
and two rows], the level of significance α = 0.05 and the calculated value of TS χ test is 4.444.
2
(a) The critical value of test statistics using = CHISQ.INV.RT(0.05, 1) is 3.841459

[compare with table value from the χ distribution table with precision to three decimal
2
places 3.841]
(b) The p-value using = CHISQ.DIST.RT(4.444,1) is 0.035024 .
Exercise 1: Berenson 15.45

A pay television operator is analyzing the relationship between income levels and
subscriber options – Sportpack and Moviepack. A sample of 506 subscribers revealed the
following information. These data are available on Moodle in Excel Exercises Week 11,
Exercise1-3.xlsx worksheet Ex1.
High Low
income income Total
Sportpack 123 154 277
Moviepack 118 111 229
Total 241 265 506
(a) Is there evidence of a relationship between income level and subscriber option? (Use
α = 0.10 )
(b) Calculate the p-value and interpret its meaning.

To be able to answer both question, we advise you to follow step by step procedure and
answer the relevant questions
Recommended working for part (a)
i. State the null and alternative hypotheses:
________________________________________________________________________
________________________________________________________________________
____________________________________________________________
ii. Some of the necessary calculations are provided below. Complete the tables on this
page.
EXPECTED FREQUENCIES IF H0 HOLDS High income Low income

Sportpack 131.9308
Moviepack 119.9308
Cell f0 fe f0 − fe (f0 − fe )2 (f0 − fe )2 /fe

High 131.9308 -8.93083004 79.75973 0.604557
income/Sportpack 123
High 118 8.93083004 79.75973
income/Moviepack
Low income/Sportpack 54 79.75973
Low 111 119.9308 -8.93083004
income/Moviepack
Total 506
iii. Hence complete the hypothesis test. Use 10% significance.

Use Excel for finding the critical value of the test statistics
iv. Is the frequency count in all cells of the contingency table ideal to conduct this test?
Explain (answer Yes or No is not sufficient).
_______________________________________________________________________
_______________________________________________________________________
______________________________________________________________

Recommended working for part (b)
You need to use Excel for finding p-value using the test statistics found in part (ii). Interpretation
of p-value is the same in all statistical hypothesis testing procedures.
Exercise 2: Berenson 15.26

A study of household communication argues that there is a relationship between the type of family
role for an individual and the preferred type of communication. From a survey of 1,105 extended
households Australia-wide, the results were cross-tabulated to create the table below. These data
are available on Moodle in Excel Exercises Week 11, Exercise1-3.xlsx worksheet Ex2.
Type of telecommunication preferred

Household
role Mobile Landline Internet Mail
Father 60 59 116 79
Mother 40 82 91 85
Children 80 33 133 10
Grandparent 10 105 31 91
(a) At the 0.01 level of significance, is there evidence of a significant relationship between
family role and type of preferred communication?
(b) What is your answer to (a) if you use the 0.05 level of significance?
Exercise 3: Time series components
Year Quarter Sales ($ million)

2002 1 18
2 33
3 25
4 41
2003 1 22
2 20
3 36
4 33
2004 1 27
2 38
3 44
4 52
2005 1 31
2 26
3 29
4 45
The quarterly sales of a department store chain were recorded for the past four years from 2002 to
2005. These data are available on Moodle in Excel Exercises Week 11, Exercise1-3.xlsx worksheet
Ex3.
(i) Graph the time series. (You will need to create an appropriate column for the time variable.)
(ii) What components appear to be present in the time series?

Exercise 1:
(a) Is there evidence of a relationship between income level and subscriber option?
(Use α = 0.10 )
(b) Calculate the p-value and interpret its meaning.
To be able to answer both questions, we advise you to follow step by step procedure and
answer the relevant questions
Recommended working for part (a)
i. State the null and alternative hypotheses:
𝐻𝐻0 : Subscriber option is independent of the income level (relationship between income level and
subscriber option does not exist)
𝐻𝐻1 : Subscriber option is dependent of the income level (relationship between income level and
subscriber option exists)
ii. Some of the necessary calculations are provided below. Complete the tables on this
page (calculated values are bolded).
EXPECTED FREQUENCIES IF H0 HOLDS High income Low income

Sportpack 131.9308 145.0692
Moviepack 109.0692 119.9308
Cell f0 fe f0 − fe (f0 − fe )2 (f0 − fe )2

/fe
131.9308 -8.93083004 79.75973 0.6046
High 123
income/Sportpack
109.0692 8.93083004 79.75973 0.7313
High 118
income/Moviepack
145.0692 8.93083004 79.75973 0.5498
Low income/Sportpack 154
119.9308 -8.93083004 79.75973 0.6650
Low 111
income/Moviepack
Total 506 506 2.5507

iii. Hence complete the hypothesis test. Use 10% significance.
Use Excel for finding the critical value of the test statistics
=CHISQ.INV.RT(0.1,1) gives 2.705543 which is the critical value
Calculated test statistics is 2.5507
(𝑓𝑓𝑜𝑜 − 𝑓𝑓𝑒𝑒 )2
Test statistic: 𝜒𝜒 2 = � , level of significance: 𝛼𝛼 = 0.1
𝑓𝑓𝑒𝑒
Test statistic has a 𝜒𝜒 2 distribution with (2 – 1)(1 – 1) = 1 degrees of freedom

2
𝜒𝜒𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 2.705543
2
𝜒𝜒𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 is sum of last column of table:
2
𝜒𝜒𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 = 2.5507 > 2.705543 we cannot reject 𝐻𝐻0
We cannot reject H 0 at 10% level of significance. The sample DOES NOT provide enough
evidence to show that variables the subscriber option and the income level are dependent
iv. Is the frequency count in all cells of the contingency table ideal to conduct this test?
Explain (answer Yes or No is not sufficient).
We require that all expected frequencies f e ≥ 5 for the χ test to be valid. We have all
2
expected frequencies in table ii. larger than 5, hence frequency count in all cells appeared
to be ideal to conduct this test.
Recommended working for part (b)

You need to use Excel for finding p-value using the test statistics found in part (ii). Interpretation
of p-value is the same in all statistical hypothesis testing procedures.
=CHISQ.DIST.RT(2.5507,1) gives 0.110245 which is the p-value. The smallest value of alpha
leading to rejection of null hypothesis should be approximately 0.11 (or at least 11%)
Exercise 2:
We recommend that you set and follow a similar structure to Exercise 1 to be able to answer part
(a) and (b)
(a) Calculated value of the test statistics is 234.6986, which is larger than the critical value
21.666. This leads to rejection of null hypothesis hence at 1% level of significance we can
conclude that there is relationship between the household role and the type of preferred
communication.
(b) If the level of significance increased to 0.05, the critical value decreased to 16.919. This
however is still not large enough for changing the answer from part (a). Extremely small
−45
p-value based on using =CHISQ.DIST.RT(234.6986,9) is 1.6843 × 10 confirms that it
is basically impossible not to reject null hypothesis about independence between the type
of preferred communication and the household role, hence at any level of confidence we
can conclude that there is relationship between the household role and the type of preferred
communication.

Exercise 3: (i)
Quarterly Sales from 2002 to 2005

60
50
Sales ($ million)
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Quarters since 2002
(ii) There is evident trend (linear, positive) component. There is also seasonal (quarterly)
component, which is overpowered by a strong random component. Should we remove or decrease
the strength of this random component, the quarterly pattern (quarterly seasonal component) will
be dominant. Removal of random component is not required in MCD2080.

Week 11 Tutorial: Time series - Modelling and forecasting with
trend and seasonal models
You will have Formative Assessment Task (FAT III) during this period.
Students revise Week 10 lecture material & respective tutorial
REMEMBER: PRACTICE WITH “MyStudyPlan” for EXAM
Part A:
1. Two forecasting procedures were applied to the series in Excel Ex29 in order to forecast the sales
for the four quarters of 2006. The forecasts and the actual sales are given in the following table (Sales
and forecasts in $million):
Year Quarter Sales Forecast 1 Forecast 2
2006 1 30 31.2 28.6
2 31 38.0 33.7
3 40 42.7 36.1
4 49 54.2 44.4
(a) For each forecasting procedure, calculate the Mean Absolute Deviation and the Mean Square Error.
Show working. Summarize result in the table bellow
Forecast 1 (method 1) Forecast 2 (method 2)
MAD
MSE
(b) Based on these measures, which is the more accurate forecasting method? Explain.
2. Recently the economic welfare of North Carolina has been in the spotlight with the Democratic
Convention being held there in preparation for the 2012 Presidential election.
(a) Here is the full story of the unemployment rate in North Carolina each quarter for the last 10 years.
Note that the time series starts in the second year of the Bush presidency, and t = 29 corresponds
to January 2009, the beginning of the Obama presidency. (The quarterly figures are provided in
January, April, July and October.) 1
14.0 Unemployment rate, North Carolina

12.0
10.0
8.0
6.0
4.0
2.0
0.0
0 4 8 12 16 20 24 28 32 36 40 44 48
Time in quarters, Jan 2002 = 1
Discuss what components are present in the series, and what evidence you see in the graph for each of
them (recall to the tutorial week 11 learning outcome).
1 The data used in this question is derived from the US Bureau of Labor Statistics website,
http://www.bls.gov/home.htm , accessed 7 September 2012.
MCD2080 Tutorial Questions and Computing Exercise – Week 11 Page 1
3. “Shop and Run” sports kit store intends to measure the seasonal effect on its sales based on the last
three years’ data. The seasonal indices for each quarter of each of these three years have been provided
in the table below.
Use this data to calculate seasonal indices for Summer, Autumn, Winter and Spring, correct to three
decimal places.
Year Summer Autumn Winter Spring

2013 0.997 1.176 1.345 0.483
2014 1.007 1.148 1. 319 0.526
2015 1.098 1.119 1.226 0.557
Part B:
4. (Q2 continued) 2
(b) Let’s look at the period of steady decline in the unemployment rate from t = 1 (January 2002) to t
= 26 (April 2008) highlighted by the rectangle.
We could model this steady decline with a linear downward trend and a seasonal component.
The trend line was determined based on the data from t = 1 to t = 26, using linear regression, and found to
=
be Tt 6.878 − 0.094t
Based on this trend line, and assuming a multiplicative model, seasonal indices were calculated and found
to be:
S 1 (Jan) 1.06
S2
(Apr) 0.97
S 3 (Jul) 1.04
S 4 (Oct) 0.93
(i) Interpret each seasonal index, use template wording provided. Circle correct choice and find
missing words in the text (be sure that you know how to get required %)
1st quarter (Jan) = 1.06 this indicates that on average the ………………. in North Carolina is
…….. % above/below the …………………………….. projection.
2nd quarter (Apr) = 0.97 this indicates that on average the ………………. in North Carolina is
3rd quarter (Jul) = 1.04 this indicates that on average the ………………. in North Carolina is
2
The data used in this question is derived from the US Bureau of Labor Statistics website,
4th quarter (Oct) = 0.93 this indicates that on average the ………………. in North Carolina is
(ii) Using this (linear trend and quarterly seasonality) model, state the forecasting model first and
find the forecasts of the unemployment rate for the last two quarters of the Bush presidency (July
2008, t = ? and October 2008, t = ?) and the first quarter of the Obama presidency (January 2009,
t = ?).
(For comparison only, the actual values were 6.7%, 6.9% and 9.5%.)
(c) Assuming (is this assumption reasonable?) the seasonal indices you calculated using data from 2002
to 2008 are still valid in 2012, calculate the deseasonalised (or seasonally adjusted) unemployment
rate in North Carolina for the first three quarters of 2012. The raw data is given in the following
table, which you should complete.
Seasonally
adjusted
Unemployment unemployment
Time rate rate
t = 41, January 2012 10.5
t = 42, April 2012 9.1
t = 43, July 2012 9.8
Comment on the underlying trend in the North Carolina unemployment figures in 2012.
The data used in this question is derived from the US Bureau of Labor Statistics website,
5. Based on data from 2000 to 2005, a tourism expert developed a model of room occupancy rates in
Australia which involved a linear trend yˆ t = 9807.6 + 88.43t , where t is measured in quarters since
the beginning of 2000, and seasonal indices as shown in the following table:
Quarter SI
1 0.9940
2 0.9493
3 1.0283
4 1.0284
(a) Interpret the values of the seasonal indices

(b) Find forecasts for all four quarters of 2006 assuming a multiplicative model.
6.
Manufacturing of Australian passenger vehicles has been of particular interest in recent times with many
vehicles driven in Australia manufactured overseas. This has forced the closure of some manufacturing
plants in Australia resulting in many employees losing their jobs. The graph below shows the number of
Australian passenger vehicles manufactured monthly between January 2006 and December 2013.

Manufactured Australian Passenger Vehicles
Number of Manufactured Australian Passenger Vehicles 70000
60000
50000
40000
30000
20000
10000
(a) Discuss what components are present in the series, and what evidence you see in the graph for each
of them.
(b) The data was analysed by fitting a straight line to all the data from 2006 to the end of 2013 and
calculating seasonal indices based on this regression line. The estimated trend line is 𝑇𝑇�𝑡𝑡 =
50817.35 − 31.908𝑡𝑡, where 𝑡𝑡 is time in months, with 𝑡𝑡 = 1 corresponding to January 2006. Based
on this trend line, and assuming a multiplicative model, the seasonal indices were calculated and
found to be:
January 0.895 April 0.887 July 0.978 October 0.997
February 0.974 May 0.985 August 0.998 November 1.023
March 1.058 June 1.201 September 1.009 December 0.994
(i) Using the trend and seasonal components, forecast the number of vehicles
manufactured for April 2014.
(ii) Provide an interpretation of the seasonal index for May as seen in the table.
(c) Forecasts were also obtained for the first three months of 2014. Use your answer in (b) to complete
the table below (including the total) and calculate the mean absolute deviation for the forecasts of
the Australian manufactured passenger vehicles. (Some of the necessary calculations are provided
in the table.)

Actual Australian manufactured Absolute
Year Month passenger vehicles Forecast deviation
2014 January 41626 42711 1085
February 43651 46713 3062
March 47079 50423 3344
April 39027
TOTAL
Mean absolute deviation:
(d) An alternative forecasting method is also used and is found to have a mean absolute deviation of
2647.443. Use the value calculated in (c), along with this information, to determine whether the
method used in (b) or the alternative method is the best for forecasting. Explain briefly.
Extra practice question:

The revenues of a chain of ice-cream stores are listed for each quarter during the years 2005 –
2009.
Revenue ($million)
YEAR
QUARTER 2005 2006 2007 2008 2009
1 16 14 17 18 21
2 25 27 31 29 30
3 31 32 40 45 52
4 24 23 27 24 32
(a) Use Excel to plot the time series and comment on the components that appear to be present in
the series.
(b) Regression analysis produced the trend line quarters, with t = 1 in Quarter 1 of 2005.
∧
yt = 20.2 + 0.732t , where t is the time in
The seasonal indices calculated based on this trend line are:
S1 0.646
S2 1.045
S3 1.405
S4 0.904
Use this information to forecast revenues for the four quarters of 2010.

Recommendation:
Textbook questions for further practice

3rd and 4th edition
Time –series forecasting (Chapter 14)
pp 519 - 520
14.8, (Let 1989 correspond to t = 0)
14.9, (Let 1969 correspond to t = 0, so note that the tenth year corresponds to t = 9)
14.11,
14.12(not d), Data provided in CH14dataQ12_13_14.xlsx
14. 13(not d), Data provided in CH14dataQ12_13_14.xlsx
p. 538: 14.34 (b), 14.35 (b), 14.36 (c)
You will have Formative Assessment Task (III) during this period.
Students revise week 10 lecture material & respective tutorial
This will be delivered on Learning Catalytics
Exercise 1
For this question, you will need to use the Excel document in Computing Exercises week 12 Exercise1-
3.xls worksheet Ex1.
For the “Turnover in hospitality” example discussed in the Week 10 lecture, evaluate the two forecast
methods using the following steps:
1. Restrict attention to the first 20 data points relevant to years 1983 to 2002 (highlighted by black
colour in the Excel document), and estimate the two models
Model 1: Model 2:
y = β 0 + β1t + ε y = β 0 + β1t + β 2t 2 + ε
Following these shortcut instructions:
• Create the scatter plot by selecting time and turnover variables (refer to computing lab in
week 10)
• Select all points in your scatter plot by left click
• While all points are selected, right click and select Add Trendline from drop down menu
• Make your choice of functions, select Linear (default) for Model 1 or 2nd order Polynomial
for Model 2 (one at the time) and tick box Display Equation on chart from the Trendline
Options
• Requesting functions in form
=y 3560.5 + 700.52 x y=
−0.2981x 2 + 706.78 x + 3537.6
will appear on chart
2. You need to recognise and use the values of β̂ 0 and β̂1 obtained for Model 1, and of β̂ 0 , β̂1 and
β̂ 2 obtained for Model 2, for calculating the values yˆi corresponding to 2003 to 2007, relevant to
the last 5 data points (highlighted by red colour in the Excel document) . Use your Excel function
creating skills from computing lab in week 1 or follow instructions provided. Put these values in
columns E and F of the worksheet. For example, suppose the intercept and slope for Model 1 are
in cells L17 and L18 respectively. Then the entry in E22 would be
=$L$17+$L$18*B22
and could be dragged down to calculate the remaining years.
3. Now in columns G and H, calculate the sum of absolute forecast errors for Model 1 and Model 2
respectively; and in columns I and J, calculate the sum of squared forecast errors. (Absolute value
is the function ABS in Excel.)
4. Summarise the results of these calculations by completing the table below, and state, with reasons,
which model you choose.
Model 1 Model 2
Estimated equation
Sum of Absolute forecast

errors (SAFE)
Sum of Squared forecast
errors (SSFE)
5. Were you surprised by the answer? Discuss briefly.
Exercise 2
For this question, you will need to use the Excel document in Computing Exercises week 12 Exercise1-
3.xls worksheet Ex2. Exports are an important component of the exchange rate and, domestically, are an
important indicator of employment and profitability in certain industries. The value of Australian exports
has increased in the 26 year period described in the following table.
(a) Plot the time series.
(b) Estimate a linear trend line.
Year t Exports ($ million) Year t Exports ($ million)

1973 1 22040 1988 16 43670
1974 2 20686 1989 17 43966
1975 3 22600 1990 18 49078
1976 4 23848 1991 19 52399
1977 5 25070 1992 20 55027
1978 6 25659 1993 21 60702
1979 7 27225 1994 22 64548
1980 8 29256 1995 23 67052
1981 9 27804 1996 24 76005
1982 10 28135 1997 25 78932
1983 11 28216 1998 26 87764
1984 12 30609 1999 27 85991
1985 13 35275 2000 28 97286
1986 14 36735 2001 29 119539
1987 15 40469 2002 30 121129
(c) Does this trend-line capture the long-term behavior of the series?
(d) Predict trend value of exports for Year 2004.

Exercise 3
Trend and seasonal forecast

In the table, the revenues of a chain of ice cream stores are listed for each quarter during a five-year period.
This data is available in Computing Exercises week 12 Exercise1-3.xls worksheet Ex3.
Revenue ($million)
Year
Quarter 2001 2002 2003 2004 2005
1 16 14 17 18 21
2 25 27 31 29 30
3 31 32 40 45 52
4 24 23 27 24 32
Use Excel to do the following analysis:

(a) Plot the time series.
(b) Use regression analysis to find a trend line.
(c) This trend line was used for calculating the seasonal indices listed below. Based on these seasonal
indices, describe the seasonal pattern of the time series.
Quarter SI
1 0.646
2 1.045
3 1.405
4 0.904
(d) Using the seasonal indices and the trend line, forecast the revenues for the four quarters of 2006.
(e) Given that the actual revenues for 2006 were observed as:
2006
quarter revenue
1 23
2 35
3 50
4 32
calculate the Mean Absolute Deviation (MAD) and Mean Square Error (MSE) for the forecast.
(f) If instead we multiply the trend-line values by seasonal indices obtained from the moving average the
MAD value for the resulting 2006 forecast is 2.005, and the MSE value is 5.081. What do you conclude
about the relative merits of the two forecasts?

Exercise 1:
1. Model=1: yˆ 3560.5 + 700.52t Model 2: yˆ =
−0.2981t + 706.78t + 3537.6
2
2. Estimated values corresponding to 2003 to 2007
Model 1 Model 2
18271.42 18248.52
18971.94 18942.48
19672.46 19635.85
20372.98 20328.61
21073.50 21020.79
3.
a. In G22, type =abs(E22 – D22), and drag down the formula
b. In H22, type =abs(F22 – D22), and drag down the formula
c. In I22, type =G22^2, and drag down the formula
d. In J22, type =H22^2, and drag down the formula
e. Then sum each of the columns G, H, I, J to get the values in the table below
4.
Model 1 Model 2
Estimated equation =yˆ 3560.5 + 700.52t yˆ =
−0.2981t 2 + 706.78t + 3537.6
Sum of Absolute forecast 16798.31 16984.63
errors
Sum of Squared forecast 63059962.34 64426811.28
errors
The sum of absolute forecast errors, and the sum of squared forecast errors are both lower for the
linear model (Model 1) than for the quadratic model (Model 2). Therefore I would choose Model
1.
[Note that we could also calculate MAD and MSFE by dividing each of the numbers in the table
by 5. This would not make any difference to the comparison.]
5.
At first sight, the answer may seem surprising because Model 2 fitted better with in-sample values.
But this had to be the case, because Model 1 is a particular case of Model 2 so a nonzero value of
β 2 is only used to get an improvement on the Model 1 result. If the time series is really linear, then
the quadratic term might be fitted to small random errors and rapidly move away from the actual
values in the next few years.

Exercise 2: (a)
Australian annual exports
140000
120000
Exports ($million)
100000
80000
60000
40000
20000
0
1970 1975 1980 1985 1990 1995 2000 2005
Year
(b)
Multiple R 0.933652
R Square 0.871706
Adjusted R
Square 0.867124
Observations 30
ANOVA
Regression 1 2.15E+10 2.15E+10 190.248 5.21E-14
Residual 28 3.16E+09 1.13E+08
Total 29 2.46E+10
Upper
Coefficients Standard Error t Stat P-value Lower 95% 95%
Intercept 3002.021 3976.577 0.754926 0.456597 -5143.63 11147.67
t 3089.579 223.9955 13.79304 5.21E-14 2630.745 3548.413
Australian annual exports
140000
120000
Exports ($million)
100000
80000
60000
40000
20000
0
1970 1975 1980 1985 1990 1995 2000 2005
Year
(c) It is clear that this trend line does not capture the long term behavior of the series. Rather than being
randomly above and below the trend line, early values are above, from 1990 to almost 2000 they are below,
and recent values are again above the trend line. This indicates that the appropriate model would be
nonlinear. The shape of the time series graph looks approximately exponential.
(d) Linear trend: Predicted value of export ($million) = 3002.021 + 3089.579 t

As t = 30 in year 2002,then t =32 in year 2004.
export ($million) = 3002.021 + 3089.579 t
export ($million) = 3002.021 + 3089.579 (32) = $101,868.55 mn.

Exercise 3: (a)
Revenue 2001 - 2005
60
50
40
$ million
30
20
10
0
0 5 10 15 20 25
Time in quarters since beginning of 2001
(b)
SUMMARY OUTPUT
Multiple R 0.4525
R Square 0.2047
Adjusted R Square 0.1606
Observations 20
ANOVA
Significance
df SS MS F F
Regression 1 356.6451 356.6451 4.634580645 0.045146
Residual 18 1385.155 76.95305
Total 19 1741.8
Standard
Coefficients Error t Stat P-value Lower 95% Upper 95%
Intercept 20.2105 4.0750 4.9596 0.000101325 11.64926 28.77179
t 0.73233 0.34017 2.1528 0.04514568 0.01765 1.447011
Linear trend equation: Estimated revenue = Yˆt = 20.2105 + 0.7323 t
(c)
Interpretation of Seasonal Indices:
1st quarter = 0.646 this indicates that on average the Ice cream revenue is by 35.4 % below the
trend line projection.
2nd quarter = 1.045 this indicates that on average the Ice cream revenue is by 4.5 % above the
3rd quarter = 1.405 this indicates that on average the Ice cream revenue is by 40.5 % above the
4th quarter = 0.904 this indicates that on average the Ice cream revenue is by 9.6 % below the

(d) Forecast for 2006:
Quarter t Yˆt SI Ft
1 21 35.5895 0.6464 23.0037

2 22 36.3218 1.0450 37.9551
3 23 37.0541 1.4050 52.0609
4 24 37.7865 0.9037 34.1467
(e)
Quarter Yt Ft Yt - Ft |Y t - F t | (Y t -F t ) 2
1 23 23.0037 -0.0037 0.003676 1.35161E-05

2 35 37.9551 -2.9551 2.955135 8.732823455
3 50 52.0609 -2.0609 2.060856 4.247128841
4 32 34.1467 -2.1467 2.146692 4.608286813
4
7.16636 17.5882526
∑| Y − F | t t
7.16636
MAD = t =1
= = 1.7916
4 4
4
∑ (Y − F ) t t
2
17.5883
MSE = t =1
= = 4.3971
(f) 4 4
Calculated values
Method1: MAD for the trend-line SI forecast = 1.8
Method1: MSE for the trend-line SI forecast = 4.4
Provided values
Method 2: MAD for the moving average SI forecast = 2.005
Method 2: MSE for the moving average SI forecast = 5.081
The trend and seasonal forecast is much better, because both MAD and MSE for the trend line SI forecast
are smaller than the moving averages forecasts.

Formulae
X -μ
Normal distribution X ~ N(μ, σ2) Z= ~ N(0,1) X =μ  σZ
σ
Sampling distributions
σ X - μX
X ~ N(μX ,σ ) where μX = μ and σX =
2 Z= ~ N(0,1)
X
n σ
n
π (1- π )
p ~N π, σp2  where σp = if nπ ≥ 5 and n(1 – π) ≥ 5
n
Estimation
(1   ) 100% confidence interval
σ σ σ
x-z  μ< x + z x±z
n
α
n n
α α
2 2 2
s s s
x -tn-1, < μ< x +tn-1, x ± tn-1,
n
α
n n
α α 2
2 2
p(1- p) p(1- p) p(1-p)

p- z α < π <p+ z α 2 p± z α
2
n n 2
n
Test statistics
x - μ0 x - μ0 p - π0
z= t= z=
σ n s n π 0(1- π 0 )
n
( fo − fe )2 Row total × Column total

χ2 = ∑ fe
fe =
Overall total
df =(r -1)×(c-1)
MCD2080 Business Statistics Page 2 of 7

Regression and correlation
Linear regression model y = β0 +β1x +ε
Estimated linear regression equation yˆ = b0 +b1x
b1 - c
t= ~ tn-2
se(b1)
Time series analysis
Linear trend equation yˆ = b0 +b1t
Quadratic trend equation yˆ = b0 +b1t +b2t2
n
 Yt - Ft
Mean Absolute Deviation MAD = i=1
n
n
2
  Yt - Ft 
Mean Square Forecast Error MSFE = i=1
n
Multiplicative time series model Yt = Tt ×St ×Ct ×It

Statistical Tables
Table 1a: Standard Normal Distribution
Cumulative probabilities (z ≤ 0)
Tabulates P(Z < z) for given z

Z
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
–0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
–0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
–0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
–0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
–0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
–0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
–0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
–0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
–0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
–0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
–1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
–1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
–1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
–1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
–1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
–1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
–1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
–1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
–1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
–1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
–2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
–2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
–2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
–2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
–2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
–2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
–2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
–2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
–2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
–2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
–3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
–3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
–3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
–3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
–3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002

Table 1b: Standard Normal Distribution (continued)
Cumulative probabilities (z ≥ 0)
Tabulates P(Z < z) for given z
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998

Table 2: Critical Values of Student’s t Distribution
80% 90% 95% 98% 99% Confidence Level
df 10% 5% 2.5% 1% 0.5% Level of significance:1 tail test
20% 10% 5% 2% 1% Level of significance:2 tail test
1 3.078 6.314 12.706 31.821 63.656
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 α
7 1.415 1.895 2.365 2.998 3.499
𝑡𝛼
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
𝛼ൗ 𝛼ൗ
16 1.337 1.746 2.120 2.583 2.921 2 2
17 1.333 1.740 2.110 2.567 2.898 −𝑡𝛼ൗ2 𝑡𝛼ൗ2
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
31 1.309 1.696 2.040 2.453 2.744
32 1.309 1.694 2.037 2.449 2.738
33 1.308 1.692 2.035 2.445 2.733
34 1.307 1.691 2.032 2.441 2.728
35 1.306 1.690 2.030 2.438 2.724
36 1.306 1.688 2.028 2.434 2.719
37 1.305 1.687 2.026 2.431 2.715
38 1.304 1.686 2.024 2.429 2.712
39 1.304 1.685 2.023 2.426 2.708
40 1.303 1.684 2.021 2.423 2.704
45 1.301 1.679 2.014 2.412 2.690
50 1.299 1.676 2.009 2.403 2.678
60 1.296 1.671 2.000 2.390 2.660
70 1.294 1.667 1.994 2.381 2.648
80 1.292 1.664 1.990 2.374 2.639
90 1.291 1.662 1.987 2.368 2.632
100 1.290 1.660 1.984 2.364 2.626
150 1.287 1.655 1.976 2.351 2.609
200 1.286 1.653 1.972 2.345 2.601
Normal 1.282 1.645 1.960 2.327 2.576

Table 3: Critical values of the chi square distribution
2 ,df
 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005
df: 1 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672

Formulae for Descriptive Statistics
Raw data
xi , i  1n or N is the list of data points in the sample or population respectively.
N n
 xi  xi
Mean  i 1 x  i 1
N n
N
(x i  )2
n
(x - x) i
2
Variance 2  i 1 S2 = i=1
N n-1
p
Location of pth percentile Lp = (n +1)
100
Standard Deviation σ = σ2 s = s2
σ s
Coefficient of Variation CV = ×100% CV = ×100%
μ x
Probability distributions
Discrete
xi , i  1k is the list of possible values that the variable can take.
k
Expected value μ = E(X)=  x ip(x i )
i=1
k
Variance σ2 = Var(X)= (x i -μ)2 p(x i )
i=1
Binomial
X~ Bin (n, p) E(X) = np Var (X) = np(1-p)
1
Table 4a: Binomial Distribution: P(X = x)
n x p 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95
5 0 0.7738 0.5905 0.3277 0.2373 0.1681 0.0778 0.0313 0.0102 0.0024 0.0010 0.0003 0.0000 0.0000
1 0.2036 0.3281 0.4096 0.3955 0.3602 0.2592 0.1563 0.0768 0.0284 0.0146 0.0064 0.0005 0.0000
2 0.0214 0.0729 0.2048 0.2637 0.3087 0.3456 0.3125 0.2304 0.1323 0.0879 0.0512 0.0081 0.0011
3 0.0011 0.0081 0.0512 0.0879 0.1323 0.2304 0.3125 0.3456 0.3087 0.2637 0.2048 0.0729 0.0214
4 0.0000 0.0005 0.0064 0.0146 0.0284 0.0768 0.1563 0.2592 0.3602 0.3955 0.4096 0.3281 0.2036
5 0.0000 0.0000 0.0003 0.0010 0.0024 0.0102 0.0313 0.0778 0.1681 0.2373 0.3277 0.5905 0.7738
6 0 0.7351 0.5314 0.2621 0.1780 0.1176 0.0467 0.0156 0.0041 0.0007 0.0002 0.0001 0.0000 0.0000
1 0.2321 0.3543 0.3932 0.3560 0.3025 0.1866 0.0938 0.0369 0.0102 0.0044 0.0015 0.0001 0.0000
2 0.0305 0.0984 0.2458 0.2966 0.3241 0.3110 0.2344 0.1382 0.0595 0.0330 0.0154 0.0012 0.0001
3 0.0021 0.0146 0.0819 0.1318 0.1852 0.2765 0.3125 0.2765 0.1852 0.1318 0.0819 0.0146 0.0021
4 0.0001 0.0012 0.0154 0.0330 0.0595 0.1382 0.2344 0.3110 0.3241 0.2966 0.2458 0.0984 0.0305
5 0.0000 0.0001 0.0015 0.0044 0.0102 0.0369 0.0938 0.1866 0.3025 0.3560 0.3932 0.3543 0.2321
6 0.0000 0.0000 0.0001 0.0002 0.0007 0.0041 0.0156 0.0467 0.1176 0.1780 0.2621 0.5314 0.7351
7 0 0.6983 0.4783 0.2097 0.1335 0.0824 0.0280 0.0078 0.0016 0.0002 0.0001 0.0000 0.0000 0.0000
1 0.2573 0.3720 0.3670 0.3115 0.2471 0.1306 0.0547 0.0172 0.0036 0.0013 0.0004 0.0000 0.0000
2 0.0406 0.1240 0.2753 0.3115 0.3177 0.2613 0.1641 0.0774 0.0250 0.0115 0.0043 0.0002 0.0000
3 0.0036 0.0230 0.1147 0.1730 0.2269 0.2903 0.2734 0.1935 0.0972 0.0577 0.0287 0.0026 0.0002
4 0.0002 0.0026 0.0287 0.0577 0.0972 0.1935 0.2734 0.2903 0.2269 0.1730 0.1147 0.0230 0.0036
5 0.0000 0.0002 0.0043 0.0115 0.0250 0.0774 0.1641 0.2613 0.3177 0.3115 0.2753 0.1240 0.0406
6 0.0000 0.0000 0.0004 0.0013 0.0036 0.0172 0.0547 0.1306 0.2471 0.3115 0.3670 0.3720 0.2573
7 0.0000 0.0000 0.0000 0.0001 0.0002 0.0016 0.0078 0.0280 0.0824 0.1335 0.2097 0.4783 0.6983
8 0 0.6634 0.4305 0.1678 0.1001 0.0576 0.0168 0.0039 0.0007 0.0001 0.0000 0.0000 0.0000 0.0000
1 0.2793 0.3826 0.3355 0.2670 0.1977 0.0896 0.0313 0.0079 0.0012 0.0004 0.0001 0.0000 0.0000
2 0.0515 0.1488 0.2936 0.3115 0.2965 0.2090 0.1094 0.0413 0.0100 0.0038 0.0011 0.0000 0.0000
3 0.0054 0.0331 0.1468 0.2076 0.2541 0.2787 0.2188 0.1239 0.0467 0.0231 0.0092 0.0004 0.0000
4 0.0004 0.0046 0.0459 0.0865 0.1361 0.2322 0.2734 0.2322 0.1361 0.0865 0.0459 0.0046 0.0004
5 0.0000 0.0004 0.0092 0.0231 0.0467 0.1239 0.2188 0.2787 0.2541 0.2076 0.1468 0.0331 0.0054
6 0.0000 0.0000 0.0011 0.0038 0.0100 0.0413 0.1094 0.2090 0.2965 0.3115 0.2936 0.1488 0.0515
7 0.0000 0.0000 0.0001 0.0004 0.0012 0.0079 0.0313 0.0896 0.1977 0.2670 0.3355 0.3826 0.2793
8 0.0000 0.0000 0.0000 0.0000 0.0001 0.0007 0.0039 0.0168 0.0576 0.1001 0.1678 0.4305 0.6634
9 0 0.6302 0.3874 0.1342 0.0751 0.0404 0.0101 0.0020 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.2985 0.3874 0.3020 0.2253 0.1556 0.0605 0.0176 0.0035 0.0004 0.0001 0.0000 0.0000 0.0000
2 0.0629 0.1722 0.3020 0.3003 0.2668 0.1612 0.0703 0.0212 0.0039 0.0012 0.0003 0.0000 0.0000
3 0.0077 0.0446 0.1762 0.2336 0.2668 0.2508 0.1641 0.0743 0.0210 0.0087 0.0028 0.0001 0.0000
4 0.0006 0.0074 0.0661 0.1168 0.1715 0.2508 0.2461 0.1672 0.0735 0.0389 0.0165 0.0008 0.0000
5 0.0000 0.0008 0.0165 0.0389 0.0735 0.1672 0.2461 0.2508 0.1715 0.1168 0.0661 0.0074 0.0006
6 0.0000 0.0001 0.0028 0.0087 0.0210 0.0743 0.1641 0.2508 0.2668 0.2336 0.1762 0.0446 0.0077
7 0.0000 0.0000 0.0003 0.0012 0.0039 0.0212 0.0703 0.1612 0.2668 0.3003 0.3020 0.1722 0.0629
8 0.0000 0.0000 0.0000 0.0001 0.0004 0.0035 0.0176 0.0605 0.1556 0.2253 0.3020 0.3874 0.2985
9 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0020 0.0101 0.0404 0.0751 0.1342 0.3874 0.6302
10 0 0.5987 0.3487 0.1074 0.0563 0.0282 0.0060 0.0010 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.3151 0.3874 0.2684 0.1877 0.1211 0.0403 0.0098 0.0016 0.0001 0.0000 0.0000 0.0000 0.0000
2 0.0746 0.1937 0.3020 0.2816 0.2335 0.1209 0.0439 0.0106 0.0014 0.0004 0.0001 0.0000 0.0000
3 0.0105 0.0574 0.2013 0.2503 0.2668 0.2150 0.1172 0.0425 0.0090 0.0031 0.0008 0.0000 0.0000
4 0.0010 0.0112 0.0881 0.1460 0.2001 0.2508 0.2051 0.1115 0.0368 0.0162 0.0055 0.0001 0.0000
5 0.0001 0.0015 0.0264 0.0584 0.1029 0.2007 0.2461 0.2007 0.1029 0.0584 0.0264 0.0015 0.0001
6 0.0000 0.0001 0.0055 0.0162 0.0368 0.1115 0.2051 0.2508 0.2001 0.1460 0.0881 0.0112 0.0010
7 0.0000 0.0000 0.0008 0.0031 0.0090 0.0425 0.1172 0.2150 0.2668 0.2503 0.2013 0.0574 0.0105
8 0.0000 0.0000 0.0001 0.0004 0.0014 0.0106 0.0439 0.1209 0.2335 0.2816 0.3020 0.1937 0.0746
9 0.0000 0.0000 0.0000 0.0000 0.0001 0.0016 0.0098 0.0403 0.1211 0.1877 0.2684 0.3874 0.3151
10 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0010 0.0060 0.0282 0.0563 0.1074 0.3487 0.5987
2
Table 4a: Binomial Distribution: P(X = x) (ctd)
n x p 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95
12 0 0.5404 0.2824 0.0687 0.0317 0.0138 0.0022 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.3413 0.3766 0.2062 0.1267 0.0712 0.0174 0.0029 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.0988 0.2301 0.2835 0.2323 0.1678 0.0639 0.0161 0.0025 0.0002 0.0000 0.0000 0.0000 0.0000
3 0.0173 0.0852 0.2362 0.2581 0.2397 0.1419 0.0537 0.0125 0.0015 0.0004 0.0001 0.0000 0.0000
4 0.0021 0.0213 0.1329 0.1936 0.2311 0.2128 0.1208 0.0420 0.0078 0.0024 0.0005 0.0000 0.0000
5 0.0002 0.0038 0.0532 0.1032 0.1585 0.2270 0.1934 0.1009 0.0291 0.0115 0.0033 0.0000 0.0000
6 0.0000 0.0005 0.0155 0.0401 0.0792 0.1766 0.2256 0.1766 0.0792 0.0401 0.0155 0.0005 0.0000
7 0.0000 0.0000 0.0033 0.0115 0.0291 0.1009 0.1934 0.2270 0.1585 0.1032 0.0532 0.0038 0.0002
8 0.0000 0.0000 0.0005 0.0024 0.0078 0.0420 0.1208 0.2128 0.2311 0.1936 0.1329 0.0213 0.0021
9 0.0000 0.0000 0.0001 0.0004 0.0015 0.0125 0.0537 0.1419 0.2397 0.2581 0.2362 0.0852 0.0173
10 0.0000 0.0000 0.0000 0.0000 0.0002 0.0025 0.0161 0.0639 0.1678 0.2323 0.2835 0.2301 0.0988
11 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0029 0.0174 0.0712 0.1267 0.2062 0.3766 0.3413
12 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0022 0.0138 0.0317 0.0687 0.2824 0.5404
15 0 0.4633 0.2059 0.0352 0.0134 0.0047 0.0005 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.3658 0.3432 0.1319 0.0668 0.0305 0.0047 0.0005 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.1348 0.2669 0.2309 0.1559 0.0916 0.0219 0.0032 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
3 0.0307 0.1285 0.2501 0.2252 0.1700 0.0634 0.0139 0.0016 0.0001 0.0000 0.0000 0.0000 0.0000
4 0.0049 0.0428 0.1876 0.2252 0.2186 0.1268 0.0417 0.0074 0.0006 0.0001 0.0000 0.0000 0.0000
5 0.0006 0.0105 0.1032 0.1651 0.2061 0.1859 0.0916 0.0245 0.0030 0.0007 0.0001 0.0000 0.0000
6 0.0000 0.0019 0.0430 0.0917 0.1472 0.2066 0.1527 0.0612 0.0116 0.0034 0.0007 0.0000 0.0000
7 0.0000 0.0003 0.0138 0.0393 0.0811 0.1771 0.1964 0.1181 0.0348 0.0131 0.0035 0.0000 0.0000
8 0.0000 0.0000 0.0035 0.0131 0.0348 0.1181 0.1964 0.1771 0.0811 0.0393 0.0138 0.0003 0.0000
9 0.0000 0.0000 0.0007 0.0034 0.0116 0.0612 0.1527 0.2066 0.1472 0.0917 0.0430 0.0019 0.0000
10 0.0000 0.0000 0.0001 0.0007 0.0030 0.0245 0.0916 0.1859 0.2061 0.1651 0.1032 0.0105 0.0006
11 0.0000 0.0000 0.0000 0.0001 0.0006 0.0074 0.0417 0.1268 0.2186 0.2252 0.1876 0.0428 0.0049
12 0.0000 0.0000 0.0000 0.0000 0.0001 0.0016 0.0139 0.0634 0.1700 0.2252 0.2501 0.1285 0.0307
13 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0032 0.0219 0.0916 0.1559 0.2309 0.2669 0.1348
14 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005 0.0047 0.0305 0.0668 0.1319 0.3432 0.3658
15 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005 0.0047 0.0134 0.0352 0.2059 0.4633
20 0 0.3585 0.1216 0.0115 0.0032 0.0008 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.3774 0.2702 0.0576 0.0211 0.0068 0.0005 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.1887 0.2852 0.1369 0.0669 0.0278 0.0031 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
3 0.0596 0.1901 0.2054 0.1339 0.0716 0.0123 0.0011 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
4 0.0133 0.0898 0.2182 0.1897 0.1304 0.0350 0.0046 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
5 0.0022 0.0319 0.1746 0.2023 0.1789 0.0746 0.0148 0.0013 0.0000 0.0000 0.0000 0.0000 0.0000
6 0.0003 0.0089 0.1091 0.1686 0.1916 0.1244 0.0370 0.0049 0.0002 0.0000 0.0000 0.0000 0.0000
7 0.0000 0.0020 0.0545 0.1124 0.1643 0.1659 0.0739 0.0146 0.0010 0.0002 0.0000 0.0000 0.0000
8 0.0000 0.0004 0.0222 0.0609 0.1144 0.1797 0.1201 0.0355 0.0039 0.0008 0.0001 0.0000 0.0000
9 0.0000 0.0001 0.0074 0.0271 0.0654 0.1597 0.1602 0.0710 0.0120 0.0030 0.0005 0.0000 0.0000
10 0.0000 0.0000 0.0020 0.0099 0.0308 0.1171 0.1762 0.1171 0.0308 0.0099 0.0020 0.0000 0.0000
11 0.0000 0.0000 0.0005 0.0030 0.0120 0.0710 0.1602 0.1597 0.0654 0.0271 0.0074 0.0001 0.0000
12 0.0000 0.0000 0.0001 0.0008 0.0039 0.0355 0.1201 0.1797 0.1144 0.0609 0.0222 0.0004 0.0000
13 0.0000 0.0000 0.0000 0.0002 0.0010 0.0146 0.0739 0.1659 0.1643 0.1124 0.0545 0.0020 0.0000
14 0.0000 0.0000 0.0000 0.0000 0.0002 0.0049 0.0370 0.1244 0.1916 0.1686 0.1091 0.0089 0.0003
15 0.0000 0.0000 0.0000 0.0000 0.0000 0.0013 0.0148 0.0746 0.1789 0.2023 0.1746 0.0319 0.0022
16 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 0.0046 0.0350 0.1304 0.1897 0.2182 0.0898 0.0133
17 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0011 0.0123 0.0716 0.1339 0.2054 0.1901 0.0596
18 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 0.0031 0.0278 0.0669 0.1369 0.2852 0.1887
19 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0005 0.0068 0.0211 0.0576 0.2702 0.3774
20 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0008 0.0032 0.0115 0.1216 0.3585
3
Table 4b: Cumulative Binomial Distribution: P(X ≤ x)
n x p 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95
5 0 0.7738 0.5905 0.3277 0.2373 0.1681 0.0778 0.0313 0.0102 0.0024 0.0010 0.0003 0.0000 0.0000
1 0.9774 0.9185 0.7373 0.6328 0.5282 0.3370 0.1875 0.0870 0.0308 0.0156 0.0067 0.0005 0.0000
2 0.9988 0.9914 0.9421 0.8965 0.8369 0.6826 0.5000 0.3174 0.1631 0.1035 0.0579 0.0086 0.0012
3 1.0000 0.9995 0.9933 0.9844 0.9692 0.9130 0.8125 0.6630 0.4718 0.3672 0.2627 0.0815 0.0226
4 1.0000 1.0000 0.9997 0.9990 0.9976 0.9898 0.9688 0.9222 0.8319 0.7627 0.6723 0.4095 0.2262
5 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
6 0 0.7351 0.5314 0.2621 0.1780 0.1176 0.0467 0.0156 0.0041 0.0007 0.0002 0.0001 0.0000 0.0000
1 0.9672 0.8857 0.6554 0.5339 0.4202 0.2333 0.1094 0.0410 0.0109 0.0046 0.0016 0.0001 0.0000
2 0.9978 0.9842 0.9011 0.8306 0.7443 0.5443 0.3438 0.1792 0.0705 0.0376 0.0170 0.0013 0.0001
3 0.9999 0.9987 0.9830 0.9624 0.9295 0.8208 0.6563 0.4557 0.2557 0.1694 0.0989 0.0159 0.0022
4 1.0000 0.9999 0.9984 0.9954 0.9891 0.9590 0.8906 0.7667 0.5798 0.4661 0.3446 0.1143 0.0328
5 1.0000 1.0000 0.9999 0.9998 0.9993 0.9959 0.9844 0.9533 0.8824 0.8220 0.7379 0.4686 0.2649
6 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
7 0 0.6983 0.4783 0.2097 0.1335 0.0824 0.0280 0.0078 0.0016 0.0002 0.0001 0.0000 0.0000 0.0000
1 0.9556 0.8503 0.5767 0.4449 0.3294 0.1586 0.0625 0.0188 0.0038 0.0013 0.0004 0.0000 0.0000
2 0.9962 0.9743 0.8520 0.7564 0.6471 0.4199 0.2266 0.0963 0.0288 0.0129 0.0047 0.0002 0.0000
3 0.9998 0.9973 0.9667 0.9294 0.8740 0.7102 0.5000 0.2898 0.1260 0.0706 0.0333 0.0027 0.0002
4 1.0000 0.9998 0.9953 0.9871 0.9712 0.9037 0.7734 0.5801 0.3529 0.2436 0.1480 0.0257 0.0038
5 1.0000 1.0000 0.9996 0.9987 0.9962 0.9812 0.9375 0.8414 0.6706 0.5551 0.4233 0.1497 0.0444
6 1.0000 1.0000 1.0000 0.9999 0.9998 0.9984 0.9922 0.9720 0.9176 0.8665 0.7903 0.5217 0.3017
7 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
8 0 0.6634 0.4305 0.1678 0.1001 0.0576 0.0168 0.0039 0.0007 0.0001 0.0000 0.0000 0.0000 0.0000
1 0.9428 0.8131 0.5033 0.3671 0.2553 0.1064 0.0352 0.0085 0.0013 0.0004 0.0001 0.0000 0.0000
2 0.9942 0.9619 0.7969 0.6785 0.5518 0.3154 0.1445 0.0498 0.0113 0.0042 0.0012 0.0000 0.0000
3 0.9996 0.9950 0.9437 0.8862 0.8059 0.5941 0.3633 0.1737 0.0580 0.0273 0.0104 0.0004 0.0000
4 1.0000 0.9996 0.9896 0.9727 0.9420 0.8263 0.6367 0.4059 0.1941 0.1138 0.0563 0.0050 0.0004
5 1.0000 1.0000 0.9988 0.9958 0.9887 0.9502 0.8555 0.6846 0.4482 0.3215 0.2031 0.0381 0.0058
6 1.0000 1.0000 0.9999 0.9996 0.9987 0.9915 0.9648 0.8936 0.7447 0.6329 0.4967 0.1869 0.0572
7 1.0000 1.0000 1.0000 1.0000 0.9999 0.9993 0.9961 0.9832 0.9424 0.8999 0.8322 0.5695 0.3366
8 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
9 0 0.6302 0.3874 0.1342 0.0751 0.0404 0.0101 0.0020 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.9288 0.7748 0.4362 0.3003 0.1960 0.0705 0.0195 0.0038 0.0004 0.0001 0.0000 0.0000 0.0000
2 0.9916 0.9470 0.7382 0.6007 0.4628 0.2318 0.0898 0.0250 0.0043 0.0013 0.0003 0.0000 0.0000
3 0.9994 0.9917 0.9144 0.8343 0.7297 0.4826 0.2539 0.0994 0.0253 0.0100 0.0031 0.0001 0.0000
4 1.0000 0.9991 0.9804 0.9511 0.9012 0.7334 0.5000 0.2666 0.0988 0.0489 0.0196 0.0009 0.0000
5 1.0000 0.9999 0.9969 0.9900 0.9747 0.9006 0.7461 0.5174 0.2703 0.1657 0.0856 0.0083 0.0006
6 1.0000 1.0000 0.9997 0.9987 0.9957 0.9750 0.9102 0.7682 0.5372 0.3993 0.2618 0.0530 0.0084
7 1.0000 1.0000 1.0000 0.9999 0.9996 0.9962 0.9805 0.9295 0.8040 0.6997 0.5638 0.2252 0.0712
8 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9980 0.9899 0.9596 0.9249 0.8658 0.6126 0.3698
9 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
10 0 0.5987 0.3487 0.1074 0.0563 0.0282 0.0060 0.0010 0.0001 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.9139 0.7361 0.3758 0.2440 0.1493 0.0464 0.0107 0.0017 0.0001 0.0000 0.0000 0.0000 0.0000
2 0.9885 0.9298 0.6778 0.5256 0.3828 0.1673 0.0547 0.0123 0.0016 0.0004 0.0001 0.0000 0.0000
3 0.9990 0.9872 0.8791 0.7759 0.6496 0.3823 0.1719 0.0548 0.0106 0.0035 0.0009 0.0000 0.0000
4 0.9999 0.9984 0.9672 0.9219 0.8497 0.6331 0.3770 0.1662 0.0473 0.0197 0.0064 0.0001 0.0000
5 1.0000 0.9999 0.9936 0.9803 0.9527 0.8338 0.6230 0.3669 0.1503 0.0781 0.0328 0.0016 0.0001
6 1.0000 1.0000 0.9991 0.9965 0.9894 0.9452 0.8281 0.6177 0.3504 0.2241 0.1209 0.0128 0.0010
7 1.0000 1.0000 0.9999 0.9996 0.9984 0.9877 0.9453 0.8327 0.6172 0.4744 0.3222 0.0702 0.0115
8 1.0000 1.0000 1.0000 1.0000 0.9999 0.9983 0.9893 0.9536 0.8507 0.7560 0.6242 0.2639 0.0861
9 1.0000 1.0000 1.0000 1.0000 1.0000 0.9999 0.9990 0.9940 0.9718 0.9437 0.8926 0.6513 0.4013
10 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
4
Table 4b: Cumulative Binomial Distribution: P(X ≤ x) (ctd)
n x p 0.05 0.10 0.20 0.25 0.30 0.40 0.50 0.60 0.70 0.75 0.80 0.90 0.95
12 0 0.5404 0.2824 0.0687 0.0317 0.0138 0.0022 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.8816 0.6590 0.2749 0.1584 0.0850 0.0196 0.0032 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.9804 0.8891 0.5583 0.3907 0.2528 0.0834 0.0193 0.0028 0.0002 0.0000 0.0000 0.0000 0.0000
3 0.9978 0.9744 0.7946 0.6488 0.4925 0.2253 0.0730 0.0153 0.0017 0.0004 0.0001 0.0000 0.0000
4 0.9998 0.9957 0.9274 0.8424 0.7237 0.4382 0.1938 0.0573 0.0095 0.0028 0.0006 0.0000 0.0000
5 1.0000 0.9995 0.9806 0.9456 0.8822 0.6652 0.3872 0.1582 0.0386 0.0143 0.0039 0.0001 0.0000
6 1.0000 0.9999 0.9961 0.9857 0.9614 0.8418 0.6128 0.3348 0.1178 0.0544 0.0194 0.0005 0.0000
7 1.0000 1.0000 0.9994 0.9972 0.9905 0.9427 0.8062 0.5618 0.2763 0.1576 0.0726 0.0043 0.0002
8 1.0000 1.0000 0.9999 0.9996 0.9983 0.9847 0.9270 0.7747 0.5075 0.3512 0.2054 0.0256 0.0022
9 1.0000 1.0000 1.0000 1.0000 0.9998 0.9972 0.9807 0.9166 0.7472 0.6093 0.4417 0.1109 0.0196
10 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9968 0.9804 0.9150 0.8416 0.7251 0.3410 0.1184
11 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9978 0.9862 0.9683 0.9313 0.7176 0.4596
12 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
15 0 0.4633 0.2059 0.0352 0.0134 0.0047 0.0005 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.8290 0.5490 0.1671 0.0802 0.0353 0.0052 0.0005 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.9638 0.8159 0.3980 0.2361 0.1268 0.0271 0.0037 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
3 0.9945 0.9444 0.6482 0.4613 0.2969 0.0905 0.0176 0.0019 0.0001 0.0000 0.0000 0.0000 0.0000
4 0.9994 0.9873 0.8358 0.6865 0.5155 0.2173 0.0592 0.0093 0.0007 0.0001 0.0000 0.0000 0.0000
5 0.9999 0.9978 0.9389 0.8516 0.7216 0.4032 0.1509 0.0338 0.0037 0.0008 0.0001 0.0000 0.0000
6 1.0000 0.9997 0.9819 0.9434 0.8689 0.6098 0.3036 0.0950 0.0152 0.0042 0.0008 0.0000 0.0000
7 1.0000 1.0000 0.9958 0.9827 0.9500 0.7869 0.5000 0.2131 0.0500 0.0173 0.0042 0.0000 0.0000
8 1.0000 1.0000 0.9992 0.9958 0.9848 0.9050 0.6964 0.3902 0.1311 0.0566 0.0181 0.0003 0.0000
9 1.0000 1.0000 0.9999 0.9992 0.9963 0.9662 0.8491 0.5968 0.2784 0.1484 0.0611 0.0022 0.0001
10 1.0000 1.0000 1.0000 0.9999 0.9993 0.9907 0.9408 0.7827 0.4845 0.3135 0.1642 0.0127 0.0006
11 1.0000 1.0000 1.0000 1.0000 0.9999 0.9981 0.9824 0.9095 0.7031 0.5387 0.3518 0.0556 0.0055
12 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9963 0.9729 0.8732 0.7639 0.6020 0.1841 0.0362
13 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9995 0.9948 0.9647 0.9198 0.8329 0.4510 0.1710
14 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9995 0.9953 0.9866 0.9648 0.7941 0.5367
15 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
20 0 0.3585 0.1216 0.0115 0.0032 0.0008 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
1 0.7358 0.3917 0.0692 0.0243 0.0076 0.0005 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
2 0.9245 0.6769 0.2061 0.0913 0.0355 0.0036 0.0002 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
3 0.9841 0.8670 0.4114 0.2252 0.1071 0.0160 0.0013 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
4 0.9974 0.9568 0.6296 0.4148 0.2375 0.0510 0.0059 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
5 0.9997 0.9887 0.8042 0.6172 0.4164 0.1256 0.0207 0.0016 0.0000 0.0000 0.0000 0.0000 0.0000
6 1.0000 0.9976 0.9133 0.7858 0.6080 0.2500 0.0577 0.0065 0.0003 0.0000 0.0000 0.0000 0.0000
7 1.0000 0.9996 0.9679 0.8982 0.7723 0.4159 0.1316 0.0210 0.0013 0.0002 0.0000 0.0000 0.0000
8 1.0000 0.9999 0.9900 0.9591 0.8867 0.5956 0.2517 0.0565 0.0051 0.0009 0.0001 0.0000 0.0000
9 1.0000 1.0000 0.9974 0.9861 0.9520 0.7553 0.4119 0.1275 0.0171 0.0039 0.0006 0.0000 0.0000
10 1.0000 1.0000 0.9994 0.9961 0.9829 0.8725 0.5881 0.2447 0.0480 0.0139 0.0026 0.0000 0.0000
11 1.0000 1.0000 0.9999 0.9991 0.9949 0.9435 0.7483 0.4044 0.1133 0.0409 0.0100 0.0001 0.0000
12 1.0000 1.0000 1.0000 0.9998 0.9987 0.9790 0.8684 0.5841 0.2277 0.1018 0.0321 0.0004 0.0000
13 1.0000 1.0000 1.0000 1.0000 0.9997 0.9935 0.9423 0.7500 0.3920 0.2142 0.0867 0.0024 0.0000
14 1.0000 1.0000 1.0000 1.0000 1.0000 0.9984 0.9793 0.8744 0.5836 0.3828 0.1958 0.0113 0.0003
15 1.0000 1.0000 1.0000 1.0000 1.0000 0.9997 0.9941 0.9490 0.7625 0.5852 0.3704 0.0432 0.0026
16 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9987 0.9840 0.8929 0.7748 0.5886 0.1330 0.0159
17 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9998 0.9964 0.9645 0.9087 0.7939 0.3231 0.0755
18 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9995 0.9924 0.9757 0.9308 0.6083 0.2642
19 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 0.9992 0.9968 0.9885 0.8784 0.6415
20 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
5
Notes on the Faculty-approved HP 10bII+ Financial Calculator
Chain and Algebraic Operating Modes

Definitions
The calculator has two operating modes:
 Chain, which is the default.
 Algebraic, which follows the standard “BODMAS” convention for the order of operations.
For example, the sequence of keystrokes 1+2×3 is interpreted in
 Chain mode as (1+2) × 3 giving the answer 3×3= 9
 Algebraic mode as 1 + (2 × 3) giving the answer 1+6= 7

(This follows the standard “BODMAS” convention for the order of
operations.)
Identifying and changing mode

The calculator has no permanent screen display to indicate which operating mode is current.
 So, to identify the current mode, press

“Chain” or “Algebraic” will appear briefly on the screen, then disappear.
 To change the mode, press

“Chain” or “Algebraic” will appear briefly on the screen, then disappear.
Decimal places
By default, the calculator displays only 2 decimal places.
 To change this, press followed by, for example 6, for 6 decimal places.
 For a convenient display, of up to the maximum number of digits, press , followed
by the decimal point key.
Resetting the calculator

Sometimes the calculator will lock up or will not respond correctly.
 It can be reset by inserting a thin object, eg a paperclip, into the “RESET” hole in the battery
compartment at the back of the calculator.
NB: the calculator then reverts to its default settings (including Chain mode) and all memories are
erased.

mcd2080 Tutorial Questions 2018 03

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

mcd2080 Tutorial Questions 2018 03

Uploaded by

Copyright:

Available Formats

MCD2080

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 1

1. If a = 3, b = 5, and c = 8, evaluate the following expressions:

For each of the following inequalities, state whether it is true or false:

(a) 1.6 < –1.645 (b) 1.6 < 1.645

(c) –1.645 < –1.6 (d) –1.645 < –1.7

(a) 3≤n<6 (b) 4<n≤6 (c) 3<n<7

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 2

Evaluate the following:

In standard scientific notation this means

Write the following numbers in standard scientific notation:

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 3

Bank Assets ($millions)

• Study Plan: Topic Introduction.

Recommendation: Textbook questions for further practice:

3rd or 4th edition

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 4

Some preliminary notes

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 5

(Please watch the video for illustration).

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 6

‘Decrease Decimal’ button

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 7

(Please watch the video for illustration)

Repeat the same for location Town.

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 8

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 9

• You should have a table with the following vlues.

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 10

We can achieve this in two ways,

Let’s go back to the Pivot Table we created in before.

Choose Value Field Settings as shown below.

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 11

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 12

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 13

Statistics for RURAL – Selling Price ($)

Calculated as the difference of the third and first

Coefficient of variation Calculated as ratio of standard deviation over mean

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 14

Solution of lab exercises: Week 1

Exercise 1: 'Real Estate in Regional Australia'

Using Pivot Table

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 15

Summary Statistics (to be completed in week 3)

MCD2080 Tutorial Questions and Computing Exercises – Week 1 Page 16

a. What is the variable of interest?

2. We have considered the following classifications of data types: qualitative and

MCD2080 Tutorial Questions and Computing Exercises – Week 2 Page 1

a. number of passengers in flights — 100 flights in sample

Preferred Leisure Activity of University

a. What percentage of students in total nominated either listening to music or reading as

MCD2080 Tutorial Questions and Computing Exercises – Week 2 Page 2

a. Find the % of workers for each hourly earning interval

Requirement: Before Tutorial 3, complete Week 3 all questions in Part A.

Recommendation: Textbook questions for further practice:

MCD2080 Tutorial Questions and Computing Exercises – Week 2 Page 3

MCD2080 Tutorial Questions and Computing Exercises – Week 2 Page 4

Exercise 3 Using COUNTIF and bar charts for qualitative data

Task 1: Sort the data by the variable “Job classification”.

MCD2080 Tutorial Questions and Computing Exercises – Week 2 Page 5

Task 2: Cross-tabulate data subject to types of employee and kinds of decisions.

Instructions for using countif follow: