Professional Documents
Culture Documents
Applied Mathematics Notes
Applied Mathematics Notes
There are two main types of data. These are qualitative data and quantitative data.
Qualitative Data
Qualitative data tends to be non numerical because it looks at characteristics,
descriptions, opinions, feelings, estimates, etc.
Quantitative Data
Quantitative data tends to be numerical because it is measurable. Therefore, it will
often have units. Quantitative data can also be divided into two subgroups discrete data and
continuous data.
Discrete Data
Discrete data can take only certain values within a range.
Continuous Data
Continuous data can be any value and is only restricted by the level of accuracy of the
measuring instrument used.
Examples
1. What something smells like – Qualitative
2. The number of hours someone studied – Quantitative, Continuous
3. Your last score on your Math test - Quantitative, Discrete
Important Definitions
Population – All of the elements, persons, animals, or units that fall into a set or group
under analysis.
Statistic – A number that represents a piece of information, i.e. a numerical datum. Example,
how often do you do something, how common something is, etc. These are generally
summarized from sample data. mean (x́), variance(σ2), standard deviation(σ^ ).
Sample Frames
When it has been determined exactly what group you are going to study (target population)
and how, a comprehensive list of the numbers of that population must be created. This list is
called the sample frame. Each member of that list is given a number to identify them by, which
allows members to be referred to discretely, and creating the list also allows for easy sub-
division into more manageable pieces if necessary so an effort is made to list the members in a
logical and systematic way (this also allows members to be easily located). The information that
will be used to find or contact each sample unit is included (telephone numbers, addresses,
form, etc.). Depending on the difficulty in acquiring certain types of information, you may list
cluster groups to gather information from.
Examples
1. Population : Students taking applied math unit 1 at Today’s Secondary
Sample Frame: Band C – Appiana Holmes ,L6Picasso
Band C – Dapple Athlete ,L6Einstein
Band D – Leionel Mathemati ,L6Susuki
2. Population: Birds in Barbados
Sample Frame: Doves
Pigeons
Herons
Egrets
Hummingbirds
Finches
Blackbirds
Sparrows
Sampling
Why is sampling necessary? Often a population is far too large to feasibly do a census. The
amount of resources required is too large to realistically tackle such a project – too many
people to survey, too much time needed to reach them (you need the information soon), too
large a workforce needed to engage everyone(you don’t have the money to pay them, the plant
to house them, the computers for them to input data into, etc.). With this in mind, you can pick
a subset which you hope will tell you what you want to learn about the population. Often, you
try not to bias which elements of the population you put into your sample, so that you can feel
confident that you have not skewed your analysis onto one or two sections of the population.
Thus, you want to give each element an equal chance of being chosen for your sample. This
produces what is known as a random sample.
Any sample in which an effort is made to give each sample unit an equal opportunity to be
chosen is random. When that is not followed then a non – random sample results. You may
decide on a sample of convenience, for example, where you choose to poll all of the persons on
your street, which would give persons living outside of your area no chance of being chosen.
Sampling Methods
Simple Random
Once you have a sample frame and each of the sample units is numbered, you can
choose a number to pick a sample unit to be included in your sample. The methods used to
choose the random numbers will be explored more in the next section.
Stratified Random
The problem with simple random sampling is that although an effort has been made to
choose the members of the sample randomly, all of the members may still end up coming from
the same section of the population. Example, a survey of the students of this school may turn
out having a majority of second form and lower sixth students. If the survey is interested in the
students’ opinions, about their timetable, then this will clearly produce a biased viewpoint. O
avoid this problem, especially in cases where it is deemed important to get the
viewpoints/measurements of all segments of the population, the population is first stratified
(i.e. divided into groups with seemingly similar characteristics) and then individual members are
then chosen randomly out of each group. Example, a sample of student opinions at this school
may first stratify students by form level / gender before picking students to be surveyed.
Systematic Random
In this method, every kth term is chosen to be included in the sample. Example, if your
population has 1000 elements, and you would like to sample 100 elements, then you would
choose every tenth element. The first element would be chosen randomly out of the first ten
and then every tenth place after that will be chosen. Example, if you chose the third number,
the other positions would be 13,23,33,43,53,etc.
Cluster
This method is used when it is difficult to create an exhaustive list of the sample units. If
we were seeking after the opinions of churchgoers in Barbados, it may not be practical to try to
create a list of all members of all church organizations before starting to choose persons for the
sample. Instead, you can create clusters (denominations of the church) and then sample cluster
using one of the previously discussed methods. All of the possible sample units in any given
cluster would be studied (one-stage) or units can be chosen out of each cluster to be sampled.
Quota
The population is broken down by characteristics and the proportion of the population
with each characteristic is expected to be the same proportion in the sample. Example, you
would need to know what proportion of the population is male and female, and then for each
of these subgroups, how many persons are in the various age categories, ethnic groups,
urban/rural, etc. A matrix is then created with each of the groups in their proportions and you
simply have to find a person who meets the criteria for each subgroup’s quota. Example, a
survey about life in Barbados may only be able to poll 318 people. The population is 48% male
and 52% female, then you are aiming to poll 153 males and 165 females. We then have 18.29%
people aged 0-14, 13.35% people aged 15-24, 44.22% people aged 25-54, 12.87% people aged
55-64 and 10.88% people aged over 65. This would mean the matrix could look like:
Cluster Sampling
In cluster sampling, we divide the population into groups or clusters and then we select a
random sample of these clusters. We assume that these individual clusters are representative f
the entire population.
Quota Sampling
This method is often used for interviews conducted on the street. The person conducting the
interviews usually just wishes to reach a target amount for surveys completed.
Lottery Technique
This method is conducted somewhat like a lottery draw, thus the name. All of the
numbers assigned to the sample units are placed on pieces of card, chips or balls and placed
into a bag or bowl. The numbers are mixed thoroughly and drawn without looking to determine
which sample units will be used in this sample. This is repeated until the number of sample
units to be used (previously determined) are selected.
Random Numbers
(i). Random Number Tables – There are compilations of random numbers given in tables.
These tables can be used to pick ‘appropriate’ numbers for your sample frame to choose which
sample units will be included in the sample.
(ii). Electronic Random Number Generators – Calculators and computer software (like
Excel and Numbers) have a random number function (Ran# on the calculator or Rand or
Randbetween on the computer). The Ran# and Rand functions generate a number from 0 to 1.
This number can then be multiplied by the number you are interested in to generate a number
of the correct size. The Randbetween function allows you to insert the two endpoints that you
are interested in. Example, Randbetween(100,500) will randomly generate a number between
100 and 500. This is another unbiased way of picking sample units to include in your survey.
Mode
The mode is the value that occurs the most often. It is very useful for qualitative data
because the other two central tendencies have no numerical value. It is a useful average when
there is a single most common value or perhaps two. However, it is not very helpful in
summarizing the data if there are two modes.
Median
The median is only practical when the quantitative data set is arranged in numerical
order(either ascending or descending). The median is an average that is not influenced by
external values.
The formulas for calculating the median(where n is the sample size) are as follows:
Discrete Data:
(n+1)
Median =
2
Continuous Data:
n
Median =
2
Since the median finds the central position in the dataset after the data has been arranged in
order, you can have two medians when your sample size is even and one median when the
sample size is odd.
Mean
The mean is the most commonly used average and is affected by outliers.
a. 7,7,7,7,7
b. 4,6,6.5,7.2,11.3
c. -193,-46,28,69,177
Although the means are t5he same, the spread of each of set is different. There is no variability
in set a, but the numbers in set b are much more spread out than those in sets b and a.
There are various ways of measuring the variability/spread of their distribution. The three
measures that are used to measure the variability are:
1. Range
2. Variants
3. Standard Deviation
The Range
The range is based entirely on the extreme values of the distribution and gives a quick snapshot
of the overall spread of the data.
Range = Highest Value – Lowest Value
Standard Deviation
Standard de3viation is a very good way of getting the spread and is particularly important in
statistical work. If it gives a measure of the spread of the data in relation to the mean of the
distribution, it says how close the actual values are to its mean.
∑ (x−x́ )2 ∑ x 2 −( x́)2
Standard deviation = σ =
√ n
or
√ n
The standard deviation gives the same sample units as the population whereas the variance
gives the square units of the population. It is very useful for comparing distributions. The lower
the standard deviation, the less variation there is, and the more consistent the data is.
The Variance
The variance is the square of the standard deviation. Its sample units is always in square units.
Interquartile Range
Quartiles
The lower and upper quartiles are not influenced by extreme values. There are values such that,
together with the median, they split the distribution into four equal parts:
1. The lower quartile, Q1, is the median of all of the values before the median.
2. The upper quartile, Q3, is the median of all of the values above the median.
The interquartile range is the difference between Q3 and Q1
The semi-quartile range is half of the interquartile range (interquartile range / 2)
The interquartile range measures 50% of the data in the middle after the data has been
ordered.
e.g. The maximum temperature in C, measured to the nearest whole number, was recorded
each day during June in a particular city. The temperatures were as follows.
19, 23, 19, 19, 20, 12, 22, 22, 16, 18, 13, 14, 12, 15, 16, 24, 23, 27, 30, 31, 40, 45, 44.
Stem Leaf
1 22345667899
2 0223347
3 01
4 045
Key 3|0 = 30
The Distribution of a Data Set
Negatively Skewed
Positively Skewed
Symmetric
Continuous Data
Continuous data is a set that can take any value within a particular age / interval.
For Example,
1-5 - This data set does
not only contain
numbers such as 2 3
4, for example. This
interval is continuous
because it is not
countable.
Age Lower Class Boundary (LCB) Upper Class Boundary (UCB) Range
1-5 0.5 5.5 0.5 < x < 5.5
6 - 10 5.5 10.5 5.5 < x < 10.5
11 - 15 10.5 15.5 10.5 < x < 15.5
16 - 20 15.5 20.5 15.5 < x < 20.5
When data is stratified as above, there are important data values missing, such as 5.1 – 5.9,
10.1 – 10.9 etc. To correct for the omitted values, we use the LCB and UCB. Basically, use this
method to correct for the errors.
Notation
Class Width = UCB – LCB
frequency
Frequency density =
class width
The two diagrams used to represent the continuous data are the cumulative frequency curve
(ogive) and the histogram.
d1
Mode = Lm + ( )xC
d 1+¿ d ¿
2
Where:
Lm is the lower limit of the modal class
D1 is the frequency of the modal class before
D2 is the frequency of the modal class after
C is the median class width
∑ all xf −fc
Median = Lm + ( ( 2 ) )xC
fm
Where
Lm is the lower limit of the modal class
fc is the cumulative frequency of the class before the median
fm is the frequency of the median class
C is the median class width
With grouped data, the density with the largest value as in the example (below) is 2.4, will be
the mode. This will give the highest value.
Trimmed Mean
The trimmed mean is used to take the outliers out of the data set because it is affected by
outliers. You will take a given percentage off of the median value.
Example:
If you are given an average of 8%, and calculate a mean of 25, you will take two
numbers from either side of the list and then calculate the mean.
8 25
x =2
100 1
Trimmed mean = total of numbers (excluding the end numbers) / (sample size -2)
Standard Deviation
Ungrouped Data (Discrete)
Calculate the standard deviation of a set of data
2, 3, 5, 6, 8
Which has a mean of 4.8
Observation ( x− x́ ) ( x− x́ )2
2 2 – 4.8 = -2.8 7.84
3 3 – 4.8 = -1.8 3.24
5 5 – 4.8 = 0.2 0.04
6 6 – 4.8 = 1.2 1.44
8 8 – 4.8 = 3.2 10.24
Total 22.8
22.8
Standard Deviation =
√ 5
Standard Deviation = 2.14
Grouped Data
An online test was taken by 155 students. The time spent on each question was recorded by the
computer. The following table shows the time taken, in minutes, on the final question.
x́ =
∑ fx = 459.5 =3.9
∑ f 115
2314.25
Standard deviation =
√ 115
−(3.9)2
Coding Data
Note: The value of ∑ ( x−a ) gives you the following information about the mean:
Interval Construction
Sturge’s Formula
Number of classes(K) = 1 + 3.3 log(n)
Where n is the sample size
Example:
The following data gives the time, in minutes, to the nearest minute that each of the 30
students took to complete a class project.
55 44 53 59 38 56 39 68 58 59
58 35 42 43 47 55 51 63 66 53
56 34 51 42 43 47 64 37 47 45
K = 1 + 3.3 log(30) =
K = 5.8745
K = 5.9
Approximately 6 classes are needed.
Subjective Probability
This type of probability is based on past experiences, feelings, or preferences such as
weather recordings, or an expert’s opinion.
Theoretical Probability
This is when you use a formula or someother calculation to determine the likelihood of
a random event. Intuitivly, use the definition of probability which assumes that the outcomes
are equally likely to occur or be included in their sample size. (Random Sampling)
Experimental Probability
In general, if an experiment is repeated n times under exactly the same conditions, and
R
a particular event occurs R times, then the relative frequency ( ) is an estimate of the
n
probability of this event. Note the accuracy of this estimate increases as n increases: The larger
the value of n, the better the estimate. Also, the experimental probability converges to the
theoretical probability as n increases exceptionally large.
Notation
The set of all possible outcomes is called the possibility space(s) and the number of
outcomes in the possibility space is notated ( n(s) ) or ( n(ε) ). The event A is considered as the
number of successful events and denoted as n(A). Therefore, the formula for the probability of
A is:
n( A)
P(A) =
n (S)
The possibility space is called the universal set and the event A is a subset of the
universal set.
Rule
P(A) + P(A’) = 1
Examples
There is a box containing 20 counters numbered 1, 2, 3, …, 20. A counter is selected at
random from the box. Find the probability that the number on the counter is :
i.) A multiple of 5
S = {1, 2, 3, …, 20}
n(S) = 20
n( A)
P(A) =
n (S)
4 1
P(A) = =
20 5
ii.) Not a multiple of 5
5 1 4
P(A’) = - =
5 5 5
4
P(A’) =
5
iii.) Higher than 7
B is numbers > 7
B = {8,9,10,…,20}
N(B) = 13
13
P(B) =
20
A five sided spinner has sides numbered 1,1,2,3,3. The spinner is spun twice. Find the
probability that the spinner will stop at one at least once.
3 . .
3 . .
2 . .
1 . . . . .
1 . . . . .
1 1 2 3 3
n(S) = 25
Let A be if the spinner is spun at least once.
n(A) = 16
16
P(A) =
25
A card is dealt from a well shuffled ordinary pack of 52 playing cards. Find the probability that
the card is:
i. .
a. The 4 of spades
b. The 4 of spades or any diamond
ii. The first card is placed face up on the table. It is the three of diamonds. What is the
probability that the second card is from a red suit?
i. .
a. Let A be the event of a 4 of spades.
n(A) = 1
1
P(A) =
52
b. Let B be the event of the 4 of spades or any diamond.
n(B) = 13 + 1 = 14
14
P(B) =
52
ii. Red suit = hearts or diamonds
Let C be the event of a red suit being chosen
25
P(C) =
51
A table shows the result of all of the driving tests taken at a particular test center during the
first week of September. A person is chosen at random from these who took their driving test
that week.
i. Find the probability that the person passed the driving test.
ii. Find the probability that the person is a female who failed the driving test.
iii. A male is chosen. What is the probability that he passed the driving test?
a. Is red
b. Is a blue round balloon
c. Is not yellow
A B
P (A ∩ B)
P (A U B)
Additional Rule
P (A U B) = P(A) + P(B) – P(A ∩ B)
Examples
Two events, x and y, are such that the P(x or y) = 0.8, P(x and y) = 0..35, P(x) = 0.6
Find P(y’)
X Y
0.25 0.2
P (A U B) P (A U B) 0.35 P (A U B) P (A U B)
0.2
Some pupils did a survey on comics. They asked all 100 pupils in their year group whether they
had read particular comics during the past week. They had found that 65 had read Whizz,
55read Wham, 30 had read both Whizz and Wham, and some in the year group had read
neither. A pupil was selected at random from the year group to answer more questions in the
survey.
i. P(W1 Or W2)
P(W1 U W2) = P(W1) + P(W2) – P(W1 ∩ W2)
65 55 30
= + −
100 100 100
90
=
100
ii. P(W1 U W2)’ = 1- P(W1 U W2)
90
=1-
100
10
=
100
Example
In a race where there can be only one winner, the probability that John will win is 0.3, the
probability that Paul will win is 0.2, and the probability that Mark will win is 0.4.
A card is dealt from an ordinary pack of 52 playing cards. Find the probability that the card is:
i. A club or a diamond
13
P(diamond) =
52
13
P(club) =
52
P(diamond ∩ club) = 0
13 13 26 1
P(Diamond U Club) = + = =
52 52 52 2
ii. A club or a king
13
P(Club) =
52
4
P(King) =
52
1
P(Club ∩ King) =
52
P(Club U King) = P(Club) + P(King) – P(Club ∩ King)
13 4 1
= + −
52 52 52
16
=
52
5 4 3 2 1 0 -1
4 3 2 1 0 -1 -2
3 2 1 0 -1 -2 -3
2 1 0 -1 -2 -3 -4
1 0 -1 -2 -3 -4 -5
1 2 3 4 5 6
12 1
P(Event A) = =
36 3
ii.