Professional Documents
Culture Documents
An Example of Data Analysis: Mathematics Higher Level Internal Assessment Exploration
An Example of Data Analysis: Mathematics Higher Level Internal Assessment Exploration
Assessment Exploration
AnCandidate
example Number:
of dataJohnson Du
analysis
School Number:
Examination Session: May 2019
Table of Contents
Introduction ………………………………………….…………… 2
Rationale ………………………………………………………….. 2
Aim ……………………………….……………………………….. 2
Methodology …….………………………………………………… 3
Approximation …………………..……………………………….. 4
Variables ……………………………………….…………………. 5
Modelling ……………………………………….………………… 10
Conclusion ……………………………………..………………… 12
Evaluation ……………………………………………………….. 12
Bibliography …………………………………………………….. 14
!1
Introduction
Association football is one of the most influential and most-watched team sport in the world.
Amongst the various skill involved in football, passing could be judged as the most basic but
hardest skill. Through-ball is one kind of passing—more advanced than regular face-to-face passes.
It requires the passer, usually behind the receiver, to pass the ball while the receiver is running. The
ball passes through obstacles and arrives at the receiver’s foot, who is still running at a fast speed.
Passing a stunning through-ball could be the key to winning a match of any kind—even a world cup
final. However, passing a beautiful through-ball was so hard, that it rarely appears on matches of
my level. I am a member of the school’s varsity football team, and we play tournaments with other
high schools. An accurate through ball would almost guarantee a goal in our tournament. Through-
ball is the most time-efficient way to pass a ball, but it is highly dangerous if the quality of the pass
is poor. Components of a through-ball are simple: one passer, one receiver, one ball and one
obstacle.
Rationale
There is never a formula for how to pass a through-ball. Player uses their experience and
intuition to pass through-balls as accurate as possible. For skilled professional players, they have
their own formula based on their experiences. But for amateur players, it’s extremely hard to control
the direction and speed of the ball. However, the passes could be analysed mathematically using
vectors and calculus, so less experienced players can have a smarter judge on how to perform the
pass.
Aim
Therefore, this leads to the question, How to perform a perfect through-ball? I will create a
mathematical model to represent The best angle to perform a through-ball based on the distance
to the receiver. The “best” through ball will be defined as the shortest possible running distance of
the receiver. By using a drone to videotape the passes, I can insert the videos into a program
called Logger Pro to analyse the videos, therefore coming up with a function to represent its
velocity (more will be explained in the Collection Of Data section). This exploration will allow the
less experienced player like me to make a smarter decision: whether or not to pass a risky through
ball in a real game situation. It could possibly improve my performance in football, making me a
better football player.
!2
Methodology
The mathematical model I’m
creating will model the best angle to
perform a through-ball based on the
distance between passer and the
receiver. Therefore the independent
variable will be the distance whereas
the dependent variable that needed to
be modeled will be the angle. There Figure 1: Image of Phantom 3 SE
(Banner-fa7b98ace0a3510dfed4b4d751137a6a)
are two main parameters that needed to
be considered. The path of the football and the path of the player.In addition, there are
approximations that needed to be made, otherwise, this exploration would be too complicated. To
illustrate my model for easier understanding, the model could be referred to Figure 2 on Page 4.
!Pball = d ⃗ + x (t) p ⃗
2) Path of Receiver (Preceiver):
I also used the vector formula to represent the path of the receiver (Preceiver). The receiver
basically ran along the y axis, so the position vector will be (0,0)
! . To avoid complexity, I will
approximate the receiver’s velocity to the average sprinting speed of our football team (!v). Because
the receiver only ran along the y-axis, the direction vector would be!(0,v) whereas the parameter is !t.
Therefore, the representation will be:
!Preceiver = 0 + t (0)
v
!3
Figure 2: Diagram to Illustrate the Mathematical Model
Approximation:
The Magnus Effect, which is the generation of a sidewise force on a spinning cylindrical or
spherical solid (Britannica 2006), won’t be considered. If the spinning of the ball is considered, then
there will be too much unpredictability and complexity when collecting data. I will kick the ball at
the center to minimize the Magnus effect.
The bouncing effect of the ball also won’t be considered. I will try my best to play the ball
on the ground. However, the ball will still bounce because of the roughness of the pitch and high
velocity.
I need to approximate the velocity of the ball because I cannot control my strength perfectly.
Therefore, the drone must videotape my passes multiple times to achieve an average displacement-
time graph of the ball. I will pass the ball with my full strength every time, so there won’t be great
disparities between each pass.
The velocity of the receiver also needs an approximation. In real game situations, receivers
of through-balls sprints for most of the times as they ran for the pass. Therefore I will use our
team’s average sprinting speed 4.70m
! s −1 (Sancinelli 2018) for the velocity (!v) of the receiver.
!4
Variables:
Independent Variable 1:
The independent variable in this exploration, according to the research questions, is the
distance between the passer and the receiver. According to the x-y coordinate in Figure 2, we are
only changing the x value of the passer, or in other words, the x component in the position vector
⃗ y). The independent variable will be represented as x! .
! d (x, d
Independent Variable 2:
According to the research question, the value we are finding is the angle between the path
of the ball (!Pball) and the x axis. However, we are actually finding the best angle, and the definition
for “best” depends on the value of the length of the path of the receiver (!Preceiver). For every distinct
x! d, the shorter the length, the better the angle. To represent the distance covered by P
! receiver, we need
to find the time t! for P
! ball and P
! receiver to intersect.
⃗ p, yp ). For the direction
To represent to angle, we need to use the direction vector ! p (x
⃗ p,
! p (x 1 − xp2 )
θ! p = arccos xp ⟹ xp = cos θp
! p ⃗ = (cos θp , 1 − cos2 θp )
With every different value of x! d, we change the value of θ! p to find the minimal value of t! .
!5
Dependent Variable:
According to the research question, the dependent variable will be the determination of the
“best” angle. The best angle is determined by the how short the receiver needs to run. Because the
path for the receiver is expressed by !Preceiver = 0 + t (0), the smaller the !t is, the shorter the
v
distance will be. Therefore we are finding the shortest !t possible. The mathematical expression of
out research question is shown below.
cos θp
(4.70) ( 0 )
0 xd
!t ⋅ = + x (t) ⋅
1 − cos2 θp
x! d and θ! p are independent variables that we need to change. The only thing left to complete this
expression is the expression for the displacement of the ball. Therefore, we will find this expression
by collecting data using the method aforementioned in Methodology section.
Presentation of Data:
The drone took seven clips
of my full-strength pass in total. The
clips were then inserted into the
software Logger Pro to analyze. I
have put two cones with a separation
of 10 meters onto the pitch (Figure 3).
This is used for the indication of
scale. The software Logger Pro will
recognize the scale and scale the
displacement into real life length. By Figure 2: The Video Clip Shot by the Drone
clicking on the position of the ball by every
frame using the cursor, a graph of displacement over time is automatically plotted (Figure 4, Figure
5, next page).
The clip stops when the ball is completely still, but some pass is longer than the other pass.
Therefore the last displacement recording is simply being extended for infinite amount of time.
One flaw of this method is that the drone is not completely stable in the air. Although the drone has
shock-absorbing system implemented, the image still vibrate just slightly.
!6
Figure 4: Tracing the Football Figure 5: Plotting the position of Football
According to Figure 5, the software provided as both the x displacement (red) and the y
displacement (blue). To come up with the total displacement, simply implement the Pythagorus
Theorem to the x and y value at every frame. Then, a graph is plotted to show the displacement over
time of the seven passes (Figure 6). The raw data is attached in the appendix.
Every trial shows a increasing trend
with decreasing slope. All trial starts at !(0,0),
but gradually disperses. This is why so many
trials are taken to eliminate anomalous
passes. For easier analysis, we must come up
with the graph with the average displacement
of all seven passes. To do this, the average
displacement over the seven trials of every
single frame is calculated using spreadsheets,
Figure 6: Displacement of the ball of every trial
and
the final graph of the average displacement
over time of a through-ball pass is shown in
Figure 7.
!7
Data Analysis:
The line of regression, or the line of best fit (Rumsey), can conveniently represent the change in
displacement using an mathematical function. To calculate the equation for the regression line, a
series of statistical analysis must be done about this scatterplot. All the essential statistical measures
were shown in the table below (Table 1):
Average X !t 25.8
Standard Deviation Y !Sx(t) 2.49
Looking from the graph (Figure 7), clearly, the relationship between the two variables won’t be
linear. A scientific prediction would be using knowledges from kinematics. When the ball is
moving, it will experience resistance from the ground and the air. Similar to the skydiving scenario,
the amount of friction the ball is experiencing is not constant because the speed is not constant.
Thus, the deceleration of the ball over time will be a linear relationship, instead of being constant.
a(t)
! = kt + C
1 3 1 2
x! (t) = k t + k 2t + k 3t + C
6 1 2
Taking two integration of the acceleration expression, the displacement function will be a cubic
function. Therefore, the line of regression is likely to be a cubic regression line, so we could
represent the displacement of the kicked ball over time with more scientific backup. The calculation
of the cubic regression line was overly complicated, and the formula for calculating the equation
was rarely used. Many calculator and softwares now could automatically formulate the equation of
any form of regression lines. For figure 7, the cubic regression equation will be
x! (t) = 0.0428t 3 − 1.1048t 2 + 10.311t + 1.9729
The difference between the observed value of the dependent variable and the predicted value is
called the residual ("STATISTICA Help | Residuals And Predicted Values"). In this case, the
residual will be the difference between the value described by the regression line and the actual
value of the data.
!8
Figure 8: Residual Plot of Figure 7 Figure 9: Residual Frequency Histogram of Figure 7
The residual contains both positive and negative values, and we could see from Figure 9 that
the majority of the residuals were between the range of {−0.75,0.5}
! . Nevertheless, the first seven
datapoint is not welly described by the regression line in Figure 7, as is it shown on the residual plot
(Figure 8,9). The residual is more than 0.5, and it contrasts with the rest of the residual values. What
can be done is to use a separate expression to describe solely the first few datapoint that diverges
from the regression line in Figure 7. Again, using kinematic knowledges, the friction force from the
air won’t immediately act on the ball, therefore the deceleration will be constant in the first second.
A constant deceleration will mean a linear velocity, which will make the displacement over time a
quadratic relationship.
!9
The line will cut off at !(8.6,x (8.6)), and will be replaced by a horizontal line with !y value of
!x (8.6). The final representation of the displacement of the football (!x (t)) is shown in Figure 11 and
the expression is formulated below.
Figure 11: Graph of x(t) Figure 12: Residual Frequency Histogram of x(t)
From looking at Figure 12, the residual is welly controlled between -0.5m and 0.5m. Now we
need to calculate the root mean square error (RMSE), which tells you tells you how concentrated
the data is around the line of best fit (Stephanie). The formula was shown below:
N
∑ (ri ) 2
! i=1 (r stands for residual)
N
The lower the root mean square error, the better, because a lower number implies a smaller
error. The RMSE of x! (t) is only 0.2617
! , concluding that x! (t) is a reliable regression line to describe
the displacement of the ball.
Modelling:
The case has three variables, the independent variable x! d stands for the initial distance between
the passer and the receiver. It will be the x-axis of our 3D-model, represented by X
! . The second
independent variable !θp stands for the degree between the path of the ball and the horizontal middle
line, and will be in radians. This is be our y-axis in the 3D-model, represented by !Y. The dependent
variable is !t, which is the z-axis of the model, represented by Z
! .
!10
The first step is to convert the vector equations into cartesian format, so our formula could be
inputted into a cartesian plane. Because our expression was conditioned, we convert every condition
into a single cartesian equation. The conversion won’t be complicated, as we only need to derive the
expression for the x coordinate of the vector and the expression for the y coordinate of the vector,
and add the two expression up into one expression
( )
! 4.7Z = X + (−0.3572Z 2 + 14.911Z) ⋅ cos(Y ) + 1 − cos2 (Y )
! 4.7Z = 0.0428Z 3 1 − cos2 (Y ) − 1.1048Z 2 1 − cos2 (Y ) + 10.311Z 1 − cos2 (Y ) + 1.9729 1 − cos2 (Y ) Figure 13: 3D-Model
( )
! 4.7Z = X + (0.0428Z 3 − 1.1048Z 2 + 10.311Z + 1.9729) ⋅ cos(Y ) + 1 − cos2 (Y )
( )
! 4.7Z = X + 36.16 cos(Y ) + 1 − cos2 (Y )
Summary
( )
4.7Z = X + (−0.3572Z 2 + 14.911Z) ⋅ cos(Y ) + 1 − cos2 (Y ) [0 ≤ Z ≤ 0.4]
( )
! 4.7Z = X + (0.0428Z 3 − 1.1048Z 2 + 10.311Z + 1.9729) ⋅ cos(Y ) + 1 − cos2 (Y ) [0.4 < Z < 8.6]
Figure 14: Line of trough
( )
4.7Z = X + 36.16 cos(Y ) + 1 − cos2 (Y ) [Z ≥ 8.6]
Figure 13 is the 3D-Model of our case. Blue represents condition 1, green is 2, and red is the
third. We only need to focus on the quadrant that is facing us in figure 13, which is the quadrant
where X, Y and Z were all positive. After observation, the model appears to be in a periodic pattern,
and could be explained by the cosine function in our expression. In real life, this is the case when
the angle between the path of the ball and the horizontal line had rotated a full cycle. Surprisingly,
as shown in figure 14, we could easily identify a trough shaped model in the first quadrant. The
other shaped, are all actually outside of the first quadrant, and we could exclude all other shapes and
keeping only this trough shaped model. We are finding the lowest Z value possible, and the bottom
line of this shape is all what we wanted! The equation of the line labeled in Figure 14 is roughly:
! = − 0.047X + π [0 ≤ X ≤ 22]
Y
The equation is derived directly using the software. Values exceeding X=22 will not follow the
pattern, so it is cut off. In real life, passes with distance between players over 22m were very rare.
!11
Conclusion:
Let’s restate our aim: With every different value of x! d, we change the value of θ! p to find the
minimal value of t! .Working from the equation of the trough line (Figure 14), which is the line
where the time is the shortest, a table is presented below to show the θ! p at each integer x! d.
x! d (m) 5 6 7 8 9 10 11 12 13 14 15 20
!θp 2.897 2.848 2.799 2.750 2.701 2.653 2.604 2.555 2.506 2.457 2.408 2.164
Degrees 14 17 20 22 25 28 31 34 36 39 42 56
Table 2: Distance between passer and receiver with corresponding angle of best choice
Evaluation:
There are many approximations made in this scenario (See Approximation chapter), so it is
likely that our result could deviate greatly from real life. As we can see from figure 6, the passes
diverges greatly, meaning that it is very hard to perform two similar passes. The model only
presents a reasonable relationship between the distance between the passer and the receiver and the
angle of the pass. The result could be very different if the Magnus effect, the weather, the
uncertainty of the receiver’s path, and the whole uncertainty involved in a game situation.
Moreover, the method could be improved in many ways:
• Drone is unstable, so plotting the coordinates using video taken by a unstable drone was not the
best method. A large scale motion detector can do a much better job than a drone.
!12
• As figure 6 (page 7) has shown, the divergence was actually very large, meaning that passes could
be very different from each other. A better way to represent the displacement of the ball is to take
in more physical information, such as the mass of the ball and its initial acceleration. The whole
path of the ball could be describe by using kinematics, which will be a lot more accurate than an
average of 7 trials. Or alternatively, doing more trials could eliminate more anomalous data, but
the memory of the drone could limit the number of takes the drone can shot.
• The final equation of the line that describes the X and Y coordinate of the lowest Z values of the
model was also not precise. Due to the limitation of the software, the coordinates could only be
rounded to 1 decimal place. The relationship between X and Y is actually not linear, but deceasing
exponential. However, it was unable to analyze the line of best fit in exponential form. Therefore,
the linear relationship was invalid when X is greater than around 22.
• The model didn’t take into account of defenders, which is a huge factor in decision making when
player a football game. Unfortunately it was hard to take in to account of player’ s movement, as
there was too many uncertainties.
!13
References
phantom-3-se.
www.dummies.com/education/math/statistics/how-to-calculate-a-regression-line/. Accessed 28
Aug 2018.
documentation.statsoft.com/STATISTICAHelp.aspx?path=MultipleRegression/
MultipleRegressionAnalysis/Dialogs/MultipleRegressionResidualsandPredictedValues.
Stephanie. "RMSE: Root Mean Square Error". Statistics How To, 2016, http://
!14