Professional Documents
Culture Documents
S1 Note Sample
S1 Note Sample
S1 Note
S1 Notes
(Edexcel)
“Explain briefly why mathematical models can help to improve our understanding of real
world problems”
Simplifies a real world problem; enables us to gain a quicker / cheaper understanding of a real world
problem
“Statistical models can be used to describe real world problems. Explain the process involved
in the formulation of a statistical model.”
• Observe real-world problem
• Devise a statistical model and collect data
• (Experimental) data collected
• Model used to make predictions
• Compare and observe against expected outcomes and test model;
• Statistical concepts are used to test how well the model describes the real-world problem
• Refine model if necessary.
A sample space
A list of all possible outcomes of an experiment
Event
Sub-set of possible outcomes of an experiment.
Normal Distribution
¾ Bell shaped curve
¾ symmetrical about mean; mean = mode = median
¾ 95% of data lies within 2 standard deviations of mean
¾ 68.3% between one standard deviation of mean
Independent Events
P ( A ∩ B) = P( A) × P( B)
Continuous
Continuous data can take any value in a given range. So a person’s height is continuous since it
could be any value within set limits.
Categorical
Categorical data is data which is not numerical, such as choice of breakfast cereal etc.
Or
Score (s) Frequency
5-9 2
10-14 5
Or
Score (s)
5-9 4.5 ≤ s < 9.5
10-14 9.5 ≤ s < 14.5
The stem and leaf diagram is a very useful way of grouping data whilst retaining the original data.
For example suppose we had the following scores from children in a Maths test:
85, 18, 38, 67, 43, 75, 78, 81, 92, 71, 52, 62, 49, 62, 82, 69, 55, 57, 95, 62,
We see that the smallest value is 18 and the largest is 95. The classes of stem and leaf diagrams
must be of equal width and so it would seem sensible to choose classes 10-19, 20-29, etc.
The “stem” in this case represents the tens and the “leaf” represents the units so we have the
following:
1 8 means 18
This diagram tells us the basic shape of the distribution. We can easily see the smallest and largest
values and we can see that the mode is 62. We can also use it to calculate Q1 , Q2 and Q3 .
NB: If we wanted to represent the interval 18-22 on a stem & leaf we could not make 1 the stem
since not all the numbers would begin with 1. What we could do is have a stem of 18 and then make
the leaf the number we add on to the stem. In this case our key would be:
We can use these to compare two samples by using a “back to back stem plot”. In this we put stems
down the middle and then one set of data on the left and the on the on the right. So we might end up
with a diagram as follows:
Physics Maths
75 1 8
1 2
653 3 78
421 4 39
94310 5 257
842 6 22279
63 7 158
51 8 125
9 25
The key feature of a histogram is that the area of each block is proportional to the frequency
In order for the area to be equal (or proportional) to the frequency we plot frequency density on the
frequency
vertical axis, where frequency density = . The class width is the width of the interval
class width
(i.e. it runs from the lower boundary to the upper boundary)
So the first block runs from 650 to 670 and has height 0.15 etc.
FD
Length
NB: If there are gaps between the stated upper limit of one class interval and the lower limit of
the next class interval then we need to fill those gaps as shown below. For example,
When question says “give a reason
Length (m)
to justify the use of a histogram to
represent these data”…. 15-19 14.5 ≤ x < 19.5 So the class width is
20-24 19.5 ≤ x < 24.5 5.
The answer is “Data is continuous” 25-29 24.5 ≤ x < 29.5 Not 19 − 15 = 4
NB: Be careful with age since “15-19” would mean 15 ≤ x < 20 since one is 19 until the
moment before one’s 20th birthday.
The shape of the histogram gives us information about the mean and the dispersion
This is a diagram used to illustrate the dispersion of data. There is a box which runs from the lower
quartile, Q1 to the upper quartile Q3 with the median, Q2 marked on it. The whisker then goes from
this box to the lowest value in one direction and to the highest value in the other.
We end up with a diagram as follows:
NB : It must
have a
horizontal axis
with a scale on
it.
Lowest Value Highest Value
Q1 Q2 Q3
Skewness. Concepts outliers. Any rule to identify outliers will be specified in the question.
If the question refers to outliers then we should use a refined box plot where we fix the length of the
whisker to, for example, 1.5 ( Q3 − Q1 ) at the end where an outlier lies. In this case we would mark
with crosses those outliers which were outside of the whisker
Outlier
×
Q3 + 1.5 ( Q3 − Q1 )
Q1 Q2 Q3
NB We can tell something about the skewness / symmetry of the distribution from the box plot.
For example,
Similarly,