Download as pdf or txt
Download as pdf or txt
You are on page 1of 175

VCE FURTHER MATH 3/4

HEADSTART LECTURE
JANUARY 2020
Presented by Josh Hamilton
LECTURE PLAN BLOCK 1:
25 minutes
OVERVIEW & STUDY TIPS
Get into good habits early and
know what’s in store for the year
BREAK 1 5 minutes + 5 minutes for
questions

BLOCK 2:
45 minutes
DATA ANALYSIS 1
Content! Fairly boring compulsory content!
+ 10 minutes
BREAK 2 5 minutes
for questions

BLOCK 3:
35 minutes
DATA ANALYSIS 2
Less boring but still compulsory content!
QUESTIONS 10 minutes
2
LECTURE PLAN
Note: • Lecture slides will be are available in the resource
tab below, so don’t stress if you don’t get everything
down!

• Feel free to screenshot and use any of the content I


present.

• There is the ability to ask questions below in a slido


tab, feel free to use this and I have allocated time
when I will answer the most upvoted questions.

• We’ll be going pretty fast through some boring


content, there’s a lot of content to cover. Everyone is
at different levels, but I’ll do my best to cater to
everyone!

3
BLOCK 1:
OVERVIEW & STUDY TIPS

4
COURSE OVERVIEW

CORE Data Analysis, Recursion and Financial

Matrices SACS
Networks 33% of
MODULES X 2
Geometry Final
Graphs & Linear Relations
Mark

EXAM 1 Multiple Choice Exams


66% of
Final
EXAM 2 Short Answer
Mark

5
WHAT DO I NEED TO KNOW?

Read the
study design!

6
YEAR 12
• First off congrats on surviving 12 years of school - only 1
more to go

• Many people describe year 12 as a “marathon” – I’d say


think of it more like a sprint.
– While year 12 might be the most mentally and emotionally taxing
year of schooling yet it is also the shortest.
– You only have three terms and then exams
• Try and stay focused and determined throughout year 12 – because it
goes by so quickly.
• At the end, you have the longest summer break of your life so you want
to put yourself in the best position to enjoy it and not worry about uni
offers.

7
MOTIVATION IN YEAR 12

By the time
exams come
around most
Most students people start
during term 3 studying way
more

TAKE AWAY LESSON:


If you can stay
motivated and not burn
Most students out in the middle of the
are keen – new year you will put yourself
year new me in a good position for
kinda vibes your exams. (term 3 is
the hardest term)

8
DEALING WITH YOUR OWN EXPECTATIONS

• Know what your end game is, being successful in VCE does
not mean you need to get a 90+ ATAR. VCE is just a
steppingstone to get where you want to go, and the most
important thing is that you get there.

• For Further, if you are aiming for the higher end of grades
(e.g. 40+) I would say aim for a 45. This is because there is
very little to distinguish between a 45 and a 50 – it comes
down to whether the questions play to your strengths on the
day.

9
DEALING WITH YOUR OWN EXPECTATIONS
– Change your study habits because what you have been
doing may not have worked thus far.
– Be prepared to make sacrifices throughout the year to
ensure you get the results you want (but maintain a
healthy balance) – sometimes it might take saying that
you can’t go out on a Friday or Saturday night so you can
get your work done. #covid
– Be strict on yourself when you think you are drifting and
lowering your expectations as you think they may not be
attainable and try to stay on course.
– Set small goals to make your larger goal seem less
daunting – my first goal for year 12 was to get an A+ for
my first SAC for every subject.

10
RESOURCES FOR FURTHER

We’ll go through how to use these resources…


• Textbook/Study Guides

• Calculator

• Summary Book

• Teachers, tutors

• A couple of others as well

11
TEXTBOOK/STUDY GUIDES

• Complete all textbook exercises – even if your teacher says


only to do left hand side or something like that – do ALL the
questions!!!

• Review sections – Identify your weak points in a specific


topic

• Cut things out of your textbooks and study guides

• External study guides


– ATAR Notes Course Guides/Topic Tests

12
CALCULATOR
• Calculator guides in textbook – copy them out into your
notes in case you go blank in a SAC or exam!

• Your CAS is your best friend, make sure you know it well
– For my trigonometry friends, you WILL love your CAS cause bearing
are a thing and can be a massive pain – so please learn how to use
it well!

• Shortcuts through menu screens (e.g. menu – 3 – 1 for the


solve function)

• 2 types of calculator, beware! – Ti nspire and the Casio


Classpad, use the one your school makes you get!

13
SUMMARY BOOK
• There are two (maybe three) models for a summary book
that students follow:
– Student A who puts literally everything he/she sees in their
textbooks/study guides and class notes. At the end of the year their
reference book is THICC but they have the peace of mind that
everything's in there.
• The negatives:
– It will be a pain to look through during SACs or Exams and might waste
valuable time.
– Might consume a lot of your study time making it – if using this method you
NEED to stay up to date!
• The positives:
– You have the peace of mind that you have all the content and can use it
when you get stuck.
– Creating a comprehensive summary book is a good tool to revise content

14
SUMMARY BOOK

• The second type of summary book:


– Student B who would put together a minimal summary book with
only the essential formulas – usually just a couple of pages printed
off from a summary sheet found online.
– The positives:
• You spend more time doing practice questions – SUPER IMPORTANT!
• Don’t waste as much time in an Exam/SAC flicking through your
reference book
– The negatives:
• You're on your own in an Exam/SAC if you get stuck – all you have are
formulas and only the key pieces of information.

15
SUMMARY BOOK
MY ADVICE -
DISCLAIMER: take what I say with a grain of salt as this is from
my experiences and you might be different.

One of the Golden Rules is that you should create your own
reference book (don’t buy one) – you learn so much by
understanding the formulas and not just memorizing how to
answer specific questions.

Try and find a middle ground between Student A and B – try


and cover everything you think you need to know BUT be very
concise and conscious of the amount of pages you are using.

16
SUMMARY BOOK
The essentials to consider when thinking about your own
summary book….
• Approach with a given mindset – minimalistic,
comprehensive or concise.

• Many different models of summary books, have a look at


some to decide which is best for you!

• Theory/Practice Worked Examples (in my opinion are


essential)

• Calculator instructions are essential

• Practice exam and SAC style questions you struggled with

17
TEACHERS/TUTORS
• (most of) your teachers aren’t out to get you!
– Utilise their help – they have seen 100’s and 1000’s of students go
through VCE they know what differentiates a 35 from a 40.
– The more you show you are keen and hardworking the more likely
your teachers will out time aside to help you at the end when it gets
super congested for them!

• TuteSmart – we offer a tutoring program as well which you


can check out if you’re keen – We have some Promo
content to come with this!!

• School resources – your school will have practice


exams/SACs make sure you do these!!!!

18
OTHER TIPS
• Don’t touch last year’s exam until the end of the year

• Don’t stress too much about practice exams just yet

• Keep up in class, ask for help if you fall behind

• Dedicate time to Further!

• Try different methods of study

• Don’t be the kid on ATAR calc every class putting every


possible combination of study scores in – these are
generally extremely inaccurate also
19
QUESTIONS
BREAK 1 – 5 Minutes
QUESTIONS
QUESTIONS
BREAK 1 – 5 Minutes

CODE: JOSHVCE
UNIVARIATE DATA BIVARITE DATA
DISTRUBUTIUONS DISTRUBUTIUONS

DATA ANALYSIS
The biggest part of the course
à Worth the most marks
à Most important?

MODELLING
TIME SERIES DATA
LINEAR
ASSOCIATIONS

23
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA

Univariate Bivariate
1 2

How many variables?

“facts and statistics variables:


collected together for
reference or analysis”
DATA What we measure to
collect data!

What type of variables?

Names Values / Measurements

Categorical Numerical

24
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA – TYPES OF VARIABLES
What type of variables?

Names Values / Measurements

Categorical Numerical

• We can break down categorical and


numerical variables into multiple categories!

25
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA – TYPES OF VARIABLES

Categorical Data
Data is divided into categories
e.g. Eye colour, football team

Nominal Categorical Ordinal Categorical

No sensible way to sort the categories! Categories have a natural order, but the
interval between them is not specific.
e.g. Hair colour: e.g. How satisfied are you?
Does it make sense to order blonde, brown, Very Satisfied
red and black hair? No! Somewhat Satisfied
Neutral
Somewhat Dissatisfied
Very Dissatisfied

26
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA – TYPES OF VARIABLES

Numerical Data
Data that you measure or count
e.g. Height, number of students

Numerical Discrete Numerical continuous

Data you can count, can only take on a Data that you measure, can take any
finite set of values. value (infinite possibilities)
e.g. Number of people at this lecture! e.g. How much does a $2 coin weight?
I could count all of you, and I would get a 7 grams
distinct number. Even if I wanted, I couldn’t 6.6 grams
get a more ‘accurate’ value. 6.60 grams
6.601 grams

27
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA – TYPES OF VARIABLES
What type of variables?

Names Values / Measurements

Categorical Numerical

No order Order Count Measure

Nominal Ordinal Discrete Continuous

28
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA – CHECKLIST

Is the variable categorical or numerical?

Can it be manipulated?
YES i.e. does it make sense to NO
find the mean, median,
mode, range, multiply it,
add it?
Numerical Categorical

• It makes sense to subtract two heights from each other,


heights are numerical.

• It doesn’t make sense to subtract two eyes colours from


each other, eyes colours are categorical.
29
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA – CHECKLIST

Is the variable categorical or numerical?

Can it be counted or measured?


YES NO

Numerical Categorical

Warning: Although numbers usually mean that a variable


is numerical, it doesn’t always! Categorical
variables can contain numbers too!

30
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA – CHECKLIST

Is the variable categorical or numerical?


Is the number being used as a name?

NO YES

Numerical Categorical

• For example, numbers on uniforms are used to identify players, they act
as names, and therefore they are categorical.
• Post codes, house numbers and ratings on number scales (e.g. rate out
of 5 stars) are other common categorical variables that use numbers!

31
UNIVARIATE DATA DISTRIBUTIONS
PRACTICE QUESTIONS
Assign the following variables as numerical, categorical, and whether they
are discrete/continuous or ordinal/nominal:

- How often do you study (often, sometimes, rarely)

- The temperature in degrees Celsius

- The cost to fill a car with a tank of petrol

- Shoe size (6, 8, 10)

- Colour of a pencil (red, green, blue)

- Floor levels in a building (1, 2, 3, 4)

- The number of pages in a book

32
UNIVARIATE DATA DISTRIBUTIONS
PRACTICE QUESTIONS
Assign the following variables as numerical, categorical, and whether they
are discrete/continuous or ordinal/nominal:

- How often do you study (often, sometimes, rarely) Categorical ordinal

- The temperature in degrees Celsius Numerical continuous

- The cost to fill a car with a tank of petrol Numerical discrete

- Shoe size (6, 8, 10) Categorical ordinal

- Colour of a pencil (red, green, blue) Categorical nominal

- Floor levels in a building (1, 2, 3, 4) Categorical ordinal

- The number of pages in a book Numerical discrete

33
UNIVARIATE DATA DISTRIBUTIONS
PRACTICE QUESTION

VCAA – 2016 Further Math Exam 1 - Data Analysis - Question 2 34


UNIVARIATE DATA DISTRIBUTIONS
PRACTICE QUESTION

VCAA – 2016 Further Math Exam 1 - Data Analysis - Question 2 35


UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA

Univariate Bivariate
1 2

How many variables?

DATA
What type of variables?

Names Values / Measurements

Categorical Numerical

36
UNIVARIATE DATA DISTRIBUTIONS
TYPES OF DATA – NUMBER OF VARIABLES

Univariate

- One variable (uni meaning one)

- Only one thing changes or is manipulated

Colour Number
Red 3
Black 10
White 8
Silver 5

37
TYPES OF DATA – NUMBER OF VARIABLES

BIVARIATE DATA DISTRIBUTIONS


Bivariate

- Two Variables (bi meaning two)

- There are two things that change or are manipulated

Walk Bike Car Bus


Year 7 5% 12% 71% 12%
Year 8 7% 11% 68% 14%
Year 9 9% 13% 63% 15%
Year 10 9% 17% 59% 15%

- Bivariate data is super interesting, more on this later…


38
UNIVARIATE DATA DISTRIBUTIONS
DISPLAYING UNIVARIATE DATA

What type of graph do I use? (Univariate Data)

Frequency tables
Categorical Percentage frequency tables
Bar charts
Frequency Tables
Dot plots
Numerical Box plots
Stem and leaf plots
Histograms

39
UNIVARIATE DATA DISTRIBUTIONS
DISPLAYING UNIVARIATE DATA

What type of graph do I use? (Univariate Data)

Frequency tables
Categorical Percentage frequency tables
Bar charts
Frequency tables
Dot plots
Numerical Box plots
Stem and leaf plots
Histograms

40
UNIVARIATE DATA DISTRIBUTIONS
FREQUENCY TABLES Categorical

Provide the variable and the frequency

Eg. Preferred social media platform?

Social media platform Frequency


Facebook 26
Twitter 19
Instagram 20
Snapchat 17
MySpace 1
Total 83

41
UNIVARIATE DATA DISTRIBUTIONS
PERCENTAGE FREQUENCY TABLES Categorical

Frequency may also be displayed as percentage frequency

Eg. Preferred social media platform?

Social media platform % Frequency


Facebook 31.33%
Twitter 22.89%
Instagram 24.10%
Snapchat 20.48%
MySpace 1.20%
Total 100%

42
UNIVARIATE DATA DISTRIBUTIONS
BAR CHARTS Categorical

- Variable on the x-axis, frequency on the y-axis


- Label axes, must rule lines
- Bars of equal width with space between them
Pets at home
150

100
Frequency
Pets at home
50

0
Cats Dogs Birds Fish
Type of pet

43
UNIVARIATE DATA DISTRIBUTIONS
DESCRIBING CATEGORICAL DISTRIBUTIONS

• Explain/Describe paragraphs pop up every now and then

• Have a template in your summaries

• When answering these kinds of questions, you must:


– Summarise the context
– Identify the mode (also called modal category, dominant category)
– Quote its frequency
– Quote other frequencies of interest

44
UNIVARIATE DATA DISTRIBUTIONS
EXAMPLE QUESTION
Q: Comment on the data shown in the frequency table below.
Climate Frequency % Frequency
Hot 6 26.1%
Mild 14 60.9%
Cold 3 13.0%
Total 23 100

The climate types of 23 countries were classified as being


“cold”, “mild” or “hot”. The majority of the countries, 60.9%,
were found to have a mild climate. Of the remaining countries,
26.1% were found to have a hot climate, while 13% were
found to have a cold climate.

45
UNIVARIATE DATA DISTRIBUTIONS
DESCRIBING CATEGORICAL DISTRIBUTIONS
Climate Frequency % Frequency
Hot 6 26.1%
Mild 14 60.9%
Cold 3 13.0%
Total 23 100

Context

The climate types of 23 countries were classified as being


“cold”, “mild” or “hot”. The majority of the countries, 60.9%,
were found to have a mild climate. Of the remaining countries,
26.1% were found to have a hot climate, while 13% were
found to have a cold climate.

46
UNIVARIATE DATA DISTRIBUTIONS
DESCRIBING CATEGORICAL DISTRIBUTIONS
Climate Frequency % Frequency
Hot 6 26.1%
Mild 14 60.9%
Cold 3 13.0%
Total 23 100

Mode

The climate types of 23 countries were classified as being


“cold”, “mild” or “hot”. The majority of the countries, 60.9%,
were found to have a mild climate. Of the remaining countries,
26.1% were found to have a hot climate, while 13% were
found to have a cold climate.

47
UNIVARIATE DATA DISTRIBUTIONS
DESCRIBING CATEGORICAL DISTRIBUTIONS
Climate Frequency % Frequency
Hot 6 26.1%
Mild 14 60.9%
Cold 3 13.0%
Total 23 100

Note other frequencies


The climate types of 23 countries were classified as being
“cold”, “mild” or “hot”. The majority of the countries, 60.9%,
were found to have a mild climate. Of the remaining countries,
26.1% were found to have a hot climate, while 13% were
found to have a cold climate.

48
UNIVARIATE DATA DISTRIBUTIONS
FREQUENCY TABLES Numerical

- Variable and frequency displayed


- Frequency can be real or percentage frequency
- Data with a large spread may be displayed as “grouped” data

Age Frequency
17 10
18 13
19 2

Age Frequency
0-20 18
20-40 43
40-60 25

49
UNIVARIATE DATA DISTRIBUTIONS
DOT PLOTS Numerical

• Used for discrete numerical data

• Only the x-axis is used, displaying the variable

• Number of dots represents frequency

1 2 3 4 5

50
UNIVARIATE DATA DISTRIBUTIONS
DOT PLOTS Numerical

Common question:

- Finding the median


- Count the total
- Find the dot denoting the middle point of the data

1 2 3 4 5

51
UNIVARIATE DATA DISTRIBUTIONS
DOT PLOTS Numerical

Common question:

- Finding the median


- Count the total
- Find the dot denoting the middle point of the data

1 2 3 4 5

52
UNIVARIATE DATA DISTRIBUTIONS
STEM AND LEAF PLOTS Numerical

• There are two basic parts:


– Stem: the first digit(s)
– Leaf: the last digit(s)
• For example, the number 41 may be shown as follows:

4 1

Stem Leaf
(essentially representing 40) (essentially representing 1)

53
UNIVARIATE DATA DISTRIBUTIONS
STEM AND LEAF PLOTS Numerical

Warning: Always include a key/legend with your stem plot.

Stem Leaf
1 0 2 10
Key: 2 2 3 8 9
3 9 9
1 | 0 = 10 4 4 4 7
5
6 0 1 9

54
UNIVARIATE DATA DISTRIBUTIONS
STEM AND LEAF PLOTS Numerical

Warning: Always include a key/legend with your stem plot.

Stem Leaf
1 0 2 100
Key: 2 2 3 8 9
3 9 9
1 | 0 = 100 4 4 4 7
5
6 0 1 9

55
UNIVARIATE DATA DISTRIBUTIONS
STEM AND LEAF PLOTS Numerical

Warning: Always include a key/legend with your stem plot.

- The leaf is ordered


- Can split the stem up in half or fifths if plot is bunched
Stem Leaf
1 0 2
Key: 2 2 3 8 9
3 9 9
1 | 0 = 10 4 4 4 7
5
6 0 1 9

56
UNIVARIATE DATA DISTRIBUTIONS
HISTOGRAMS Numerical

- Similar to bar charts, but no space between columns


- Variable on x-axis, frequency on y-axis
- Useful to identify key features of data that we use in
descriptions (shape, spread, centre, outliers)

57
UNIVARIATE DATA DISTRIBUTIONS
HISTOGRAMS - LOG SCALES
• Sometimes, the data’s range is too large to display on a
regular histogram

• In these cases, we use log scale histograms as a solution!

• wtf is a log scale?

Normal scale: Constant addition between marks

0 +10 10 +10 20 +10 30 +10 40

Log scale: Constant multiplication between marks

1 x10 10 x10 100 x10 1000 x10 10,000

58
UNIVARIATE DATA DISTRIBUTIONS
HISTOGRAMS - LOG SCALES

Properties of logs:

- If a number is greater than 1 its log is greater than 0

- If a number is greater than 0 but less than 1 its log is


negative

- If a number is 0 its log is undefined, and you can’t have logs


of negative numbers!

Warning: When displaying logs on an axis we only use their


order of magnitude (102 becomes 2), though we
must label the axis as log(variable).
59
UNIVARIATE DATA DISTRIBUTIONS
HISTOGRAMS - LOG SCALES

Eg. 5

4.5

3.5

2.5
Frequency
2

1.5

0.5

0
1 2 3 4 5
Log(variable)

60
UNIVARIATE DATA DISTRIBUTIONS
HISTOGRAMS - LOG SCALES

A handy log guide:

0.01 0.1 1 10 100 1000 10000 100000 1000000

10-2 10-1 100 101 102 103 104 105 106

61
UNIVARIATE DATA DISTRIBUTIONS
HISTOGRAMS - LOG SCALES

- To find the log of a number

- Eg. What is the log of 150?

- log10 (150) = 2.176

Use calculator!!
- To find the number of a log

- Eg. Find the number of log 1.683

- 101.683 = 48.1948

62
ANOTHER PRACTICE QUESTION

VCAA 2016 Exam 1 – Question 7 63


ANOTHER PRACTICE QUESTION

VCAA 2016 Exam 1 – Question 7 64


UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY

Includes:

- The minimum value


- The value of quartile 1 (Q1)
- The median
- The value of quartile (Q3)
- The maximum value

• We can work this out by hand or on the calculator,


depending on what set of data you have either one may be
quicker!

65
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY

How can we find the 5 number summary by hand?

2 5 7 8 9 13 14 15 16 20 21 25 37 41

66
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY

How can we find the 5 number summary by hand?

2 5 7 8 9 13 14 15 16 20 21 25 37 41

We can see that we have 14 numbers here

67
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY

How can we find the 5 number summary by hand?

2 5 7 8 9 13 14 15 16 20 21 25 37 41

We can see that we have 14 numbers here

Median: 14.5

68
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY
How can we find the 5 number summary by hand?

2 5 7 8 9 13 14 15 16 20 21 25 37 41

Q1: 8
Median: 14.5

69
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY
How can we find the 5 number summary by hand?

2 5 7 8 9 13 14 15 16 20 21 25 37 41

Q1: 8
Median: 14.5
Q3: 21

70
70
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY
How can we find the 5 number summary by hand?

2 5 7 8 9 13 14 15 16 20 21 25 37 41

Minimum: 2
Q1: 8
Median: 14.5
Q3: 21
Max: 41

71
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY
How can we find the 5 number summary from a dot plot?

1 2 3 4 5

72
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY
How can we find the 5 number summary from a stem plot?

Stem Leaf
1 0 2
2 2 3 8 9
3 9 9
4 4 4 7
5
6 0 1 9

73
UNIVARIATE DATA DISTRIBUTIONS
THE 5 FIGURE SUMMARY
What can we do with these details?

- IQR
- IQR = Q3 – Q1

- Outlier calculations - Extremely common exam question!

- Lower fence value


- Q1 – 1.5 x IQR

- Upper fence value


- Q3 + 1.5 x IQR

74
UNIVARIATE DATA DISTRIBUTIONS
BOX PLOTS Numerical

Visual display of the 5 number summary

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

75
UNIVARIATE DATA DISTRIBUTIONS
BOX PLOTS Numerical

Visual display of the 5 number summary

Q1 Median Q3
Min Max

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

76
UNIVARIATE DATA DISTRIBUTIONS
BOX PLOTS Numerical

Visual display of the 5 number summary

Q1 Median Q3
Min Max

Outlier Outlier

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

77
UNIVARIATE DATA DISTRIBUTIONS
BOX PLOTS Numerical

Visual display of the 5 number summary

Q1 Median Q3
Min Max

Outlier Outlier

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

78
UNIVARIATE DATA DISTRIBUTIONS
ANALYSING / DESCRIBING DATA

• Must discuss: shape, centre, spread and outliers

Outliers: Are there any present? If so, what are they? Also note
if there are no outliers.

Centre: Note the mean or the median, perhaps both


– Mean, median, mode

Spread: What is the range of the data


– IQR, range, standard deviation

79
UNIVARIATE DATA DISTRIBUTIONS
DESCRIBING HISTOGRAMS
Shape:

Positively Skewed Negatively Skewed

Approximately symmetrical Bimodal

80
UNIVARIATE DATA DISTRIBUTIONS
DESCRIBING BOX PLOTS

Once again, we look at outliers, centre, spread and shape

For box plots, shape is displayed as:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Positively Skewed Negatively Skewed

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Approximately Symmetrical
81
UNIVARIATE DATA DISTRIBUTIONS
NORMAL DISTRIBUTION

𝑚𝑒𝑎𝑛 = 𝑚𝑒𝑑𝑖𝑎𝑛 = 𝑚𝑜𝑑𝑒

Symmetrical

50% 50%

Approaches zero
on both sides

82
UNIVARIATE DATA DISTRIBUTIONS
STANDARD DEVIATION

Describes the spread of data values around the mean

e.g. mean = 5 sx = 2

- 1 S.D. above mean = 7

- 2 S.D. above mean = 9

- 1 S.D. below mean = 3

83
UNIVARIATE DATA DISTRIBUTIONS
68-95-99.7% RULE
• Around 68% of the data values lie within one standard
deviation of the mean.
• Around 95% of the data values lie within two standard
deviations of the mean.
• Around 99.7% of the data values lie within three
standard deviations of the mean.

84
UNIVARIATE DATA DISTRIBUTIONS
PRACTICE QUESTIONS

Q1. A class of 24 students receives their science test results,


with a mean of 32 and a standard deviation of 2.
How many students received a score between 28 and 32?

85
UNIVARIATE DATA DISTRIBUTIONS
PRACTICE QUESTIONS

Q1. A class of 24 students receives their science test results,


with a mean of 32 and a standard deviation of 2.
How many students received a score between 28 and 32?

32– 28 = 4
4/2 = 2
2 std deviations under the mean
34% + 13.5% = 47.5%

24 x 0.475 = 11.4
11 students

86
UNIVARIATE DATA DISTRIBUTIONS
STANDARD DEVIATION

𝑥 − 𝑥̅
𝑧=
𝑠
𝑧 = z-score
𝑥 = actual score
𝑥̅ = mean
𝑠 = standard deviation

87
UNIVARIATE DATA DISTRIBUTIONS
STANDARD DEVIATION

𝑥 = 𝑥̅ + (𝑧 ×𝑠)
Actual score = Mean + (z-score × standard deviation)

88
UNIVARIATE DATA DISTRIBUTIONS
PRACTICE QUESTIONS

Q1. A class of 24 students receives their science test results,


with a mean of 32 and a standard deviation of 2.
How many students received a score between 28 and 34?

Q2. Ben achieved a result of 35, what is his standardised


score?

Q1. 81.5%

Q2.

89
UNIVARIATE DATA SUMMARY
Univariate Categorical
DATA Types of data
Bivariate Numerical

Frequency tables
Percentage frequency
Categorical tables
Bar charts
Displaying data
Frequency tables
Dot/Box plots
Numerical
Stem and leaf plots
Histograms

5 Figure Summary 68-95-99.7% rule


z - scores
Analysing data
Shape, centres, spread
90
UNIVARIATE DATA BIVARITE DATA
DISTRUBUTIUONS DISTRUBUTIUONS

DATA ANALYSIS
The biggest part of the course
à Worth the most marks
à Most important?

MODELLING
TIME SERIES DATA
LINEAR
ASSOCIATIONS

91
TYPES OF DATA – NUMBER OF VARIABLES

BIVARIATE DATA DISTRIBUTIONS


Bivariate

- Two Variables (bi meaning two)

- There are two things that change or are manipulated

Walk Bike Car Bus


Year 7 5% 12% 71% 12%
Year 8 7% 11% 68% 14%
Year 9 9% 13% 63% 15%
Year 10 9% 17% 59% 15%

92
WHY BIVARIATE DATA?

BIVARIATE DATA DISTRIBUTIONS


– Univariate data is great at telling us the what
• What is the average height of people in this room?
• What is the most popular colour?
• What is the average temperature in Melbourne?

– Bivariate data allows us to compare data, and focus on


the why
• What is the relationship between age and height?
• Does gender play a role in someone’s favourite colour?
• How do the average temperatures in all major Australian
cities compare?

93
BIVARIATE DATA DISTRIBUTIONS
EXPLANATORY VS RESPONSE VARIABLES

• When we’ve got more than one variable, we give the


variables different names.

• Science kids, you’ll know these as independent and


dependent variables.

• Here, we call them explanatory and response variables.

94
BIVARIATE DATA DISTRIBUTIONS
EXPLANATORY VS RESPONSE VARIABLES

Explanatory Response
variable variable

• Also known as IV or EV • Also known as DV or RV


• The variable that you • The variable that you
expect when changed, think will be changed ‘as
will ‘explain’ to some a response’ to a
extent the change in changing EV.
another variable. • Plotted on y axis.
• Plotted on x axis

Age Shoe size


95
BIVARIATE DATA DISTRIBUTIONS
DISPLAYING BIVARITE DATA

Type of Data

Explanatory Graph
Response Variable
Variable

Categorical Categorical Segmented Bar Chart

Numerical Categorical Parallel Box Plots

Categorical Back to Back Stem and


Numerical Leaf Plots
(2 categories)

Numerical Numerical Scatterplots

96
BIVARIATE DATA DISTRIBUTIONS
DISPLAYING BIVARITE DATA

Type of Data

Explanatory Graph
Response Variable
Variable

Categorical Categorical Segmented Bar Chart

Numerical Categorical Parallel Box Plots

Categorical Back to Back Stem and


Numerical Leaf Plots
(2 categories)

Numerical Numerical Scatterplots

97
BIVARIATE DATA DISTRIBUTIONS
DISPLAYING BIVARITE DATA

Type of Data

Explanatory Graph
Response Variable
Variable

Categorical Categorical Segmented Bar Chart

Numerical Categorical Parallel Box Plots

Categorical Back to Back Stem and


Numerical Leaf Plots
(2 categories)

Numerical Numerical Scatterplots

98
BIVARIATE DATA DISTRIBUTIONS
DISPLAYING BIVARITE DATA

Type of Data

Explanatory Graph
Response Variable
Variable
Segmented Bar Chart or
Categorical Categorical Two Way Frequency Table

Numerical Categorical Parallel Box Plots

Categorical Back to Back Stem and


Numerical Leaf Plots
(2 categories)

Numerical Numerical Scatterplots

99
BIVARIATE DATA DISTRIBUTIONS
SEGMENTED BAR CHARTS Categorical Categorical

- Variable on x-axis, frequency on y-axis


- Can be in terms of raw number or percentage
Number of days at temperature levels
100%
90%
80%
70%
60%
Cold
% frequency 50%
Mild
40% Hot
30%
20%
10%
0%
2010 2011 2012

Year

100
BIVARIATE DATA DISTRIBUTIONS
SEGMENTED BAR CHARTS Categorical Categorical

Note: Ensure to include key!

Number of days at temperature levels


100%
90%
80%
70%
60%
Cold
% frequency 50%
Mild
40% Hot
30%
20%
10%
0%
2010 2011 2012

Year

101
BIVARIATE DATA DISTRIBUTIONS
TWO WAY FREQUENCY TABLE Categorical Categorical

Two-Way Frequency Table

- RV = Rows

- EV = Columns

Attitude Year Level


11 12
For 36% 81%
Against 64% 19%
Total 100% 100%

102
BIVARIATE DATA DISTRIBUTIONS
TWO WAY FREQUENCY TABLE Categorical Categorical

• What can we see from this?

• There seems to be an association

• If it was random, we would expect percentages to be around


50-50, but they’re not!

Attitude Year Level


11 12
For 36% 81%
Against 64% 19%
Total 100% 100%

103
BIVARIATE DATA DISTRIBUTIONS
BACK TO BACK STEM & LEAF Numerical Categorical

Back-to-back stem and leaf plots

Blue eyes Brown eyes

8 5 Key
1 6 0 2 4 6 9
8 3 0 7 135 1 | 0 = 10
5 4 2 0 8 5
9 6 2 9 9
10 0 0 0

104
BIVARIATE DATA DISTRIBUTIONS
PARALLEL DOT PLOTS Numerical Categorical

Parallel Dot Plots

- One categorical variable, one numerical variables


- Allows for easy comparison of distributions’ shape

Boys
9 10 11 12 13 14 15 16 17

Girls
9 10 11 12 13 14 15 16 17
105
BIVARIATE DATA DISTRIBUTIONS
COMPARING DISTRIBUTIONS

Can be asked to compare data sets looking at graphs

- Box Plots
- Histograms
- Dot Plots
- Back to Back Stem and Leaf Plots

We look at:

- Centre
- Spread
- Shape

106
BIVARIATE DATA DISTRIBUTIONS
COMPARING DISTRIBUTIONS

Comparing Box Plots

The distributions of boys scores on


the test are negatively skewed,
whilst the girls’ score distribution is
Boys
positively skewed. There are no
outliers. The median score for boys
is higher (M = 23) than for girls
(M= 9.5). This IQR is smaller for
boys (IQR = 10) than for girls (IQR =
Girls
12). The range of scores for boys
and girls is equal (Range = 19).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

Score

107
BIVARIATE DATA DISTRIBUTIONS
SCATTERPLOTS Numerical Numerical

- Used HEAPS in the real world, super useful!


- Make sure you know how to plot these on your calculator.
25

20

15
Y
10

0
0 5 10 15 20 25

108
BIVARIATE DATA DISTRIBUTIONS
SCATTERPLOTS Numerical Numerical

25

20

15
Y
10

0
0 5 10 15 20 25

When describing scatterplots we MUST mention three things:

1. Strength
2. Direction
3. Form
109
BIVARIATE DATA DISTRIBUTIONS
PEARSON’S CORRELATION COEFFICIENT

• Otherwise known as the r value

• Measures the strength of a linear relationship

• We generally assume that a linear relationship is present (in


some cases it isn’t, but we’ll get to that)

• Always find the value of r using your calculator, can’t do it by


hand.

110
BIVARIATE DATA DISTRIBUTIONS
PEARSON’S
PearsonCORRELATION COEFFICIENT
Correlation Coefficient

– Strong positive: r = 0.75 to 0.99


– Moderate positive: r = 0.5 to 0.74
– Weak positive: r = 0.25 to 0.49
– No association: r = -0.24 to 0.24
– Weak negative: r = -0.25 to -0.49
– Moderate negative: r = -0.5 to -0.74
– Strong negative: r = -0.75 to -0.99

• Size of r value à STRENGTH of the association

• Sign in front of r value à DIRECTION of the association

Warning: Can only calculate r values for linear data sets

111
BIVARIATE DATA DISTRIBUTIONS
DIRECTION

• Positive or negative

• A direction suggests that there is an association between


two variables

Positive Negative

112
BIVARIATE DATA DISTRIBUTIONS
FORM

• Linear, non-linear or no association

Linear: Data follows a relatively


straight line.

Non linear: Data does not occur


in a straight line patter, but does
follow a curved pattern.

No association: Data points are


randomly spread and do not
appear to be associated.

113
BIVARIATE DATA DISTRIBUTIONS
INTERPRETING r

• If asked to interpret the value of the correlation


coefficient use the following template sentences.
Be sure to put these in your bound reference!
Linear, positive and strong Linear, positive and Linear, positive and weak
It can be concluded that the y moderate There is limited evidence to
variable should increase as the There is some evidence to suggest that the y variable
x variable increases. suggest that the y variable should increase as the x
should increase as the x variable increases.
variable increases.

Linear, negative and strong Linear, negative and Linear, negative and weak
It can be concluded that the y moderate There is limited evidence to
variable should decrease as the There is some evidence to suggest that the y variable
x variable increases. suggest that the y variable should decrease as the x
should decrease as the x variable increases.
variable increases.

114
BIVARIATE DATA DISTRIBUTIONS
NON – CAUSAL EXPLANATIONS

Question: Does a high r value mean one thing caused


another?

Lets play a little game… correlation or causation?

115
BIVARIATE DATA DISTRIBUTIONS
CORRELATION OR CAUSATION?

116
BIVARIATE DATA DISTRIBUTIONS
CORRELATION OR CAUSATION?

117
BIVARIATE DATA DISTRIBUTIONS
CORRELATION OR CAUSATION?

• What about… number of shark attacks at beaches and the


temperature???

118
BIVARIATE DATA DISTRIBUTIONS
CORRELATION OR CAUSATION?

• It is generally, very hard to say one thing causes another;


we live in a complicated world.
• For example, we can say that gravity causes a ball to fall,
but in general, almost everything is correlational.

119
BIVARIATE DATA DISTRIBUTIONS
CORRELATION AND CAUSATION

• Correlation is where there is an association between


two variables

• Causation is where there is a meaningful


association between two variables, one causes the
other. Cause and effect. Association

Causation
x y

• There are other explanations for correlation other


than causation

120
BIVARIATE DATA DISTRIBUTIONS
NON-CAUSAL EXPLANATIONS
• Common response: both variables are linked to a
third, shared variable

x y

Lurking variable

• Confounding variable: too many factors to accurately


tell what’s impacting the variable

x y
? ?

Confounding variable

121
BIVARIATE DATA DISTRIBUTIONS
NON-CAUSAL EXPLANATIONS
• Coincidence: simply by chance

????

x y

122
BIVARIATE DATA DISTRIBUTIONS
POPULATION AND SAMPLING

Key Terms

- Population: All positive elements of a data set. The group


from which a sample is drawn.

- Sample: The component of a population used for a data set

- Random Sample: A sample selected from a population


without bias. E.g. Selecting names out of a hat. Every
member of a population has an equal chance of being
selected.

123
BIVARIATE DATA DISTRIBUTIONS
POPULATION AND SAMPLING

• Population parameters: statistical measures regarding the


population (used to be descriptive of that population). These
values are fixed.
– Population mean µ

– Population standard deviation σ

• Sample statistics: statistical measures regarding the sample


(used to be descriptive of that sample). Change from sample to
sample.
– Sample mean 𝑥̅

– Sample standard deviation 𝑠

124
BIVARIATE DATA DISTRIBUTIONS
POPULATION AND SAMPLING

Why use a sample?

• Because if the population is large, it can be incredibly difficult to


survey every individual in that population.

• Population surveys also usually require $$$$$$$$$$$$$

• Generally provide an approximation for population statistics.

125
BIVARIATE DATA SUMMARY
Type of Data

Explanatory Response
Graph
Variable Variable
Segmented Bar Chart
Categorical Categorical or Two Way Frequency
Table

Numerical Categorical Parallel Box Plots

Back to Back Stem and


Numerical Categorical Leaf Plots

Numerical Numerical Scatterplots

Displaying Bivariate Data Describing bivariate distributions: SDF

Explanatory Response

µ, σ vs. 𝑥,̅ 𝑠
variable variable

x y
Population vs Sample Statistics
Variables and causality

126
CODE: JOSHVCE

QUESTIONS
BREAK 2 – 5 Minutes
QUESTIONS
BREAK 2 – 5 Minutes
QUESTIONS
QUESTIONS
BREAK 2 – 5 Minutes

CODE: JOSHVCE
UNIVARIATE DATA BIVARITE DATA
DISTRUBUTIUONS DISTRUBUTIUONS

DATA ANALYSIS
The biggest part of the course
à Worth the most marks
à Most important?

MODELLING
TIME SERIES DATA
LINEAR
ASSOCIATIONS

131
MODELLING DATA DISTRIBUTIONS
MODELLING DATA
• Bivariate data, and in particular bivariate data with two
numerical variables, is extremely useful!

• This is because we can use it to construct models,


mathematical equations that allow us to predict the
values of data points we didn’t even measure.

132
MODELLING DATA DISTRIBUTIONS
LEAST SQUARES LINE OF BEST FIT

- How tf do we come up with a line of best fit?!?!

Few things:
- Least= vertical
We take the residual Squares distanceRegression
between actual data point and line of
best fit .
- We then make sure our line of best fit line minimises the sum of the squares of
• Residual
residuals = vertical distance
between actual data point
- Works best if there are no outliers
and line of best fit

• The least squares regression


line minimises the sum of the
squares of residuals

• Works best if there are no


outliers
133
MODELLING DATA DISTRIBUTIONS
LEAST SQUARES LINE OF BEST FIT

𝑦 = 𝑎 + 𝑏𝑥

- When we have a scatter-plot, we can find a linear


equation that allows us to predict 𝑦 from 𝑥.

- 𝑥 is the explanatory variable

- 𝑦 is the response variable

- Must ensure you are entering these variable in the


correct order!

- Classic VCAA trick to give you the y variable before the x.


134
MODELLING DATA DISTRIBUTIONS
LEAST SQUARES LINE OF BEST FIT
• General form of a least squares line of best fit is:
𝑦 = 𝑎 + 𝑏𝑥

• Two scenarios:
1. If you’re given the raw data, use your CAS!

2. If you’re given statistics, use the following formulas!


Where:
!!
• The slope/gradient of the line is 𝑏 = 𝑟×
!"
• The y intercept of the line is 𝑎 = 𝑦, − 𝑏𝑥̅

And:
• 𝑟 is the Pearson correlation coefficient
• 𝑠" and 𝑠# are the sample standard deviations of 𝑦 and 𝑥 respectively.
• 𝑥̅ and 𝑦, are the sample means of 𝑥 and 𝑦

135
MODELLING DATA DISTRIBUTIONS
EXAMPLE
Example

Height 160 cm 163 cm 165 cm 169 cm 174 cm 180 cm 185 cm 191 cm


Weight 60 kg 63 kg 70 kg 67 kg 72 kg 75 kg 71 kg 77 kg

Find the regression equation used to calculate height from weight


• Find the least squares regression line that allows height to
be predicted based on weight.
- Put data in the CAS (lists and spreadsheet)
- Place the height on the x axis and weight on y axis
- Find regression equation
Find the
- • Height least squares
= 58.022 + 1.63 regression
x Weight line that allows weight to
be predicted based on height.

136
MODELLING DATA DISTRIBUTIONS
INTERPRETING REGRESSION LINES

• We can interpret the regression line, 𝑦 = 𝑎 + 𝑏𝑥 by


saying:

• The y intercept is 𝑎. This means the 𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 is


𝑎 𝑢𝑛𝑖𝑡𝑠 when the 𝑥 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 is zero 𝑢𝑛𝑖𝑡𝑠.
• The slope is 𝑏. This means that the
𝑦 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 increases/decreases by 𝑏 𝑢𝑛𝑖𝑡𝑠 for every
1 𝑢𝑛𝑖𝑡 increase in the 𝑥 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒.

• Use the word ‘increases’ when b is positive, and


‘decreases’ when b is negative.
• Replace everything in red to fit the context of the
question.

137
MODELLING DATA DISTRIBUTIONS
COEFFICIENT OF DETERMINATION

- Used where we can reasonably believe there is causation

- Tells us the extent to which x caused y

- r2

Explanation: The coefficient of determination tells us that


r2x100% of the variation in the response variable is explained
by the variation in the explanatory variable

Warning: r2 will always come out of the calculator positive.


We can tell if it is truly positive or negative by
observing the scatterplot or gradient
138
MODELLING DATA DISTRIBUTIONS
EXTRAPOLATION AND INTERPOLATION

How reliable are these predictions?

“Assess the validity”

Interpolation: The x value you are predicting from is within the


data set (a fairly reliable prediction)

Extrapolation: The x value you are predicting from is outside of


the data set (an unreliable prediction)

139
MODELLING DATA DISTRIBUTIONS
RESIDUALS
• How can we mathematically check if a scatterplot is linear
or not?

• We plot a residual plot!


Least Squares Regression
• To find the residual of a specific point, the equation is:
𝑅𝑒𝑠𝑖𝑑𝑢𝑎𝑙 = 𝐴𝑐𝑡𝑢𝑎𝑙 𝑦 𝑣𝑎𝑙𝑢𝑒 − 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑦 𝑣𝑎𝑙𝑢𝑒
dual = vertical distance
ween actual data point
line of best fit

least squares regression


minimises the sum of the
ares of residuals

ks best if there are no


ers
140
MODELLING DATA DISTRIBUTIONS
RESIDUAL PLOTS
Interpreting residual plots
Non-linear trend – residual plot has a clear pattern
Regression line is NOT a good choice Clear Patterns
Non – linear

Interpreting residual plots


Data follows a linear trend

CASE 1: Perfect fit! CASE 2: Good choice

Linear

Every residual is zero Residuals randomly scattered


close to the x-axis

141
MODELLING DATA DISTRIBUTIONS
DATA TRANSFORMATION
Circle of transformations

Essential Further Mathematics Units 3 & 4, 4th Edition, pg 190.

142
MODELLING DATA DISTRIBUTIONS
Transforming data
EXAMPLE

Transforming
If we applied anApplying
2
x2 transformation,
data
a x transformation
we would fit a least
squares
y regression
1 line to: 10
4.5 16 26
xy 1 2
4.5 3 10 4
16 5
26
xX^2
2 1 4 9
Transforming 16
data 25

• The
If weregression
applied
Therefore, anline would
x2graph
we would have theequation:
transformation,
and find we would
recurrence fit aofleast
equation
squares
! = # +regression
%& v line to:

y 1 4.5 10 16 26
X^2 1 4 9 16 25

• The regression line would have equation: 143


MODELLING DATA DISTRIBUTIONS
EXAMPLE

𝑦 𝑦

𝑥 𝑥!

Pre transformation, non –linear Post – transformation


à Data has been linearised

144
MODELLING DATA SUMMARY Circle of transformations

𝑦 = 𝑎 + 𝑏𝑥

g residual plots
Essential Further Mathematics Units 3 & 4, 4th Edition, pg 190.

ws a linear trend
Interpreting and calculating lines of best fit Transformations
CASE 2: Good choice

!
Residual
plots
𝑟
Correlation coefficient

Residuals randomly scattered


close to the x-axis 145
UNIVARIATE DATA BIVARITE DATA
DISTRUBUTIUONS DISTRUBUTIUONS

DATA ANALYSIS
The biggest part of the course
à Worth the most marks
à Most important?

MODELLING
TIME SERIES DATA
LINEAR
ASSOCIATIONS

146
TIME SERIES

The same as a regular scatterplot, except:

TIME SERIES DATA


Time Series Plot
- Our explanatory (x) variable is time
• Explanatory (independent) variable is time
- Difference:
We •connect theconnect
data points
pointswith straight
with lineslines
Sea Level Rise Over Time
25

20
Sea Level Rise (cm)

15

10

0
1883
1887
1891
1895
1899
1903
1907
1911
1915
1919
1923
1927
1931
1935
1939
1943
1947
1951
1955
1959
1963
1967
1971
1975
1979
1983
1987
1991
1995
1999
2003
2007
2011
147
TIME SERIES

Time series plots can show a variety of qualitative features,

TIME SERIES DATA


and you need to knowTime Series Plot
the following:

• Explanatory (independent) variable is time


• Difference: connect points with straight lines

Sea Level Rise Over Time


25

20
Sea Level Rise (cm)

15

10

0
1883
1887
1891
1895
1899
1903
1907
1911
1915
1919
1923
1927
1931
1935
1939
1943
1947
1951
1955
1959
1963
1967
1971
1975
1979
1983
1987
1991
1995
1999
2003
2007
2011
148
TREND
Describes what is happening in the long term

TIME SERIES DATA


Trend
- Increasing Trend: Present where there is a positive slope
• Describes what’s happening in the long term
- Decreasing Trend: Present where there is a negative slope
– Increasing trend: positive slope
– Decreasing trend: negative slope

GDP of Australia

http://www.abs.gov.au/websitedbs/D3310114.nsf/home/Time+Series+Analysis:+The+Basics
149
SEASONALITY
- Peaks/troughs at regular intervals related to the calendar
Seasonality
(usually seasons of the year, but could be weekly, monthly

TIME SERIES DATA


etc.)
• Peaks/troughs pop up at regular intervals – fixed period
- Typically
– e.g.similar in each
same time sizeyear, same time each month
• Typically
- Some similar in
consistency tosize,
the compared to trend
peaks/troughs
Money Spent at Department Stores in NSW

http://www.abs.gov.au/websitedbs/D3310114.nsf/home/Time+Series+Analysis:+The+Basics
http://www.abs.gov.au/websitedbs/D3310114.nsf/home/Time+Series+Analysis:+The+Basics 150
CYCLES

- Long terms variations that are not seasonal

TIME SERIES DATA


- Cycles do NOT exist within aCycles
year, they are only present in
time series extending more than a year
• Long term variations that are NOT seasonal
- Seasonality
• Period can exist
is NOT within
fixed, cycles
typically lasts at least 2 years

http://robjhyndman.com/hyndsight/cyclicts/
151
IRREGULAR FLUCTUATIONS

• These are present in basically every times series we look at

TIME SERIES DATA


• Any data point that cannot be attributed to cycles,
seasonality, trends or structural change are classified as
irregular

• Basically, if ever there is a data point that isn’t perfectly in


place, we say there are irregular fluctuations (this was every
time series I came across)

• If you’ve got no idea just guess this lol

152
tells us that, even though demand for accommodation has fluctuated from month to month,
demand for hotel and motel accommodation has increased over time.
STRUCTURAL CHANGE
Structural change

Structural
- Where change
there is a sudden change in the established pattern

TIME SERIES DATA


Structural
of achange
time isseries
present plot
when there is a sudden change in the established pattern of a
time series plot.

- Must be a marked change that is then continued in


The time series plot below shows the power bill for a rental house (in kWh) for the
subsequent data
12 months of a year.

350
Electricity use (kWh)

300
250
200
150
100
50
0
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Month

The plot reveals an abrupt change in power usage in June to July. During this period, 153
OUTLIERS

- Individual data points that stand out from the general body

TIME SERIES DATA


of data

- Generally caused by a oneOutliers off event, common example is a


financial crisis
• Might be caused by an important one-off event

http://support.minitab.com/en-us/minitab-express/1/help-and-how-to/graphs/time-series-
plot/interpret-the-results/key-results/

154
SMOOTHING TIME SERIES

• Most of the time, time series plots look pretty messy, and

TIME SERIES DATA


this makes them hard to read.

• To make the trends on the series a little easier to identify,


we use a process called smoothing.

• Two methods:

1. Moving-mean smoothing (numerical)

2. Moving-median smoothing (graphical)

155
MOVING MEAN SMOOTHING

• Dilutes the effect of large fluctuations

TIME SERIES DATA



Basically takes into account the surrounding data to each
point to give a clearer trend flow

Time y 3-moving mean Smoothed y


1 7
2 13 (7+13+6)/3 8.67
3 6 (13+6+14)/3 11
4 14 (6+14+6.5)/3 8.83
5 6.5

156
MOVING MEAN SMOOTHING

• Slightly more complicated for even number smoothing

TIME SERIES DATA


Time y 4-moving mean Centring Smoothe
dy
1 7

2 13
7 + 13 + 6 + 14 = 40 / 4 = 10
3 6 (10 + 9.88)/2 = 9.94 9.94
13 + 6 + 14 + 6.5 = 39.5 / 4 = 9.88
4 14

5 6.5

157
MOVING MEDIAN SMOOTHING
• Uses the graphical representation of data points to find a

TIME SERIES DATA


smoothed line

Tip: Double check your answers! Math by sight leaves you


open to errors

158
MOVIND MEDIAN
2015 VCAA SMOOTHING
Exam 1

TIME SERIES DATA


2015 VCAA Exam 1 159
SEASONAL INDICES

• Another way that we can help make data more easily

TIME SERIES DATA


readable is through the use of seasonal indices.

• Sometimes, we want to compare and make regression lines


(say perhaps, sales figures for a business) for time series
data.

• However, seasonality can make it hard to accurately get a


linear relationship.

• To overcome the effects of seasonality, we can


deseasonalise our data using seasonal indices.

160
SEASONAL INDICES

𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑠𝑒𝑎𝑠𝑜𝑛

TIME SERIES DATA


𝑠𝑒𝑎𝑠𝑜𝑛𝑎𝑙 𝑖𝑛𝑑𝑒𝑥 =
𝑠𝑒𝑎𝑠𝑜𝑛𝑎𝑙 𝑎𝑣𝑒𝑟𝑎𝑔𝑒

• If we add all of the seasonal indices in a data set we are


given the number of seasons (generally this is 4 as most
data looks at a whole year)

• Note: season can mean various things:


- Month
- Quarter
- Weather Seasons

161
SEASONAL INDICES

Interpreting:

TIME SERIES DATA


A seasonal index of 1.3 during summer tells us that figures for
summer are 30% above average

A seasonal index of 0.87 during winter tells us that figures for


winter are 13% below average

163
SEASONAL INDICES

Correcting:

TIME SERIES DATA


3
To correct for seasonality, our formula is 45647869 :8;5<

Warning: People always screw this up, make sure you get it!

Eg. January’s seasonal index is 0.8

To correct for seasonality, we should increase the figures for


January by 25% because 1/0.8=1.25 (125%)

164
DESEASONALISATION

- Seasonal variation complicates regression, so we

TIME SERIES DATA


deseasonalise data before fitting a line

- Predictions must however take into account seasonal


variation

- So, we reseasonalise data predicted from our equation

165
DESEASONALISATION

- To deseasonalise:

TIME SERIES DATA


"#$%"& '"&%(
- 𝐷𝑒𝑠𝑒𝑎𝑠𝑜𝑛𝑎𝑙𝑖𝑠𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 =
)(")*+"& ,+-(.

- To reseasonalise predicted data:

- 𝐴𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒 = 𝑑𝑒𝑠𝑒𝑎𝑠𝑜𝑛𝑎𝑙𝑖𝑠𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 × 𝑠𝑒𝑎𝑠𝑜𝑛𝑎𝑙 𝑖𝑛𝑑𝑒𝑥

166
MAKING FORECASTS (LINE OF BEST FIT)

1. Calculate seasonal indices

TIME SERIES DATA


2. Deseasonalise the response variable

3. Fit a line to the deseasonalised data

4. Make predictions by substituting in values for time to get a


deseasonalised prediction

5. Reseasonalise this prediction to get the actual prediction

167
EXAMPLE
Summer ’11 Autumn ’11 Winter ‘11 Spring ‘11

TIME SERIES DATA


Sales 120 93 65 108

SI

1. Calculate seasonal indices

120 + 93 + 65 + 108 = 386


386 / 4 = 96.5 (seasonal average)

Summer ’11 SI = 120 / 96.5 = 1.24


Autumn ‘11 SI = 93 / 96.5 = 0.96

(checking all answers add to 4)

168
EXAMPLE
Summer ’11 Autumn ’11 Winter ‘11 Spring ‘11

TIME SERIES DATA


Sales 120 93 65 108

SI 1.24 0.96 0.67 1.12

2. Deseasonalise the response variable

120/1.24 = 96.77
93/0.96 = 96.88
65/0.67 = 97.01
108/1.12 = 96.43

169
EXAMPLE
1 2 3 4

TIME SERIES DATA


Sales 120 93 65 108

SI 1.24 0.96 0.67 1.12

Deseas Sales 96.77 96.88 97.01 96.43

3. Fit a line to the deseasonalised data

- To fit a line, you must number each point on the x axis


- Calculate this line the same way you would a line of best fit

𝐷𝑒𝑠𝑒𝑎𝑠𝑜𝑛𝑎𝑙𝑖𝑠𝑒𝑑 𝑠𝑎𝑙𝑒𝑠 = 97 – 0.089 × 𝑞𝑢𝑎𝑟𝑡𝑒𝑟 𝑛𝑢𝑚𝑏𝑒𝑟

170
EXAMPLE
1 2 3 4

TIME SERIES DATA


Sales 120 93 65 108

SI 1.24 0.96 0.67 1.12

Deseas Sales 96.77 96.88 97.01 96.43

4. Make predictions using regression line

Predict the deseasonalised sales for Spring ‘12:

97 – 0.089 × 8 = 96.23

171
EXAMPLE
1 2 3 4

TIME SERIES DATA


Sales 120 93 65 108

SI 1.24 0.96 0.67 1.12

Deseas Sales 96.77 96.88 97.01 96.43

5. Reseasonalise to get actual prediction

96.23 × 1.12 = 107.78

(𝑑𝑒𝑠𝑒𝑎𝑠𝑜𝑛𝑎𝑙𝑖𝑠𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 × 𝑠𝑒𝑎𝑠𝑜𝑛𝑎𝑙 𝑖𝑛𝑑𝑒𝑥 = 𝐴𝑐𝑡𝑢𝑎𝑙 𝑣𝑎𝑙𝑢𝑒)

172
plot.

TIME SERIES DATA SUMMARY


ies plot below shows the power bill for a rental house (in kWh) for the
f a year.

350
Electricity use (kWh)

300 Time y 3-moving mean Smoothed


250 y
200
150 1 7
100 2 13 (7+13+6)/3 8.67
50
0 3 6 (13+6+14)/3 11
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
4 14 (6+14+6.5)/3 8.83
Month
5 6.5
Time series features
eals an abrupt change in power usage in June to July. During this period,
• Trends
wer use suddenly decreases from around 300 kWh per month from January to
• Seasonality
nd 175k Wh for the rest of the year. This is an example of structural change
Numerical and graphical
Cycles
bably be explained by a •change in tenants, from a family with two children to a smoothing
g alone. • Irregular fluctuations
• Structural changes
ange is also displayed in the birth rate time series plot we saw earlier. This
e quite distinct trends during the period 1900–2010. These reflect significant
nts (like a war) or changes in social and 𝑛
economic 𝐴𝑉 =
circumstances.
𝑎 𝑙𝑢 𝑒 𝑜𝑓 𝑠𝑒𝑎𝑠𝑜
𝑣
uence of structural𝑑change 𝑟𝑎𝑔longer use a single 𝐷
𝑎𝑣𝑒no
𝑎𝑙 can
𝑒 𝑉×
𝑙 𝑖𝑛 𝑒 𝑥 = is
𝑠𝑒
that
𝑎 𝑠𝑜𝑛we mathematical
𝑆𝐼
𝑠𝑒 𝑎𝑠𝑜𝑛𝑎
cribe the key features of a time series plot.
𝐴𝑉
𝐷𝑉 =
𝑆𝐼

Seasonal
e present when indices
there are individual and
values thatdeseasonalisation
stand out from the general body Forecasting

173
ies plot below shows the daily power bill for a house (in kWh) for a fortnight.
THAT’S IT!

• Any questions, come ask! I’ll be around afterwards.

• Don’t stress, don’t burn yourself out

• Look after yourselves!

174
QUESTIONS
GOOD LUCK <3

Presented by:
Josh Hamilton

You might also like