Finding The Most Efficient Square Root Algorithm

Downloaded from www.clastify.
com by Hania Elgawsaky
MATHS COURSEWORK
Finding the most efficient square root algorithm
om
By Martim Cardeira
l.c
ai
gm
6@
39
r7
ad
ni
ha
y
tif
as
Cl
1
Downloaded from www.clastify.com by Hania Elgawsaky
Table of Contents
Plan of investigation ................................................................................................ 3
Obtaining data .......................................................................................................... 4
Algorithm 1: Babylonian method ............................................................................ 5
Algorithm 2: Bakhshali method .............................................................................. 8
Algorithm 3: Exponential identity ......................................................................... 10
Raw and processed results ................................................................................... 12
Algorithm 1: Babylonian method ........................................................................................................................... 13
Algorithm 2: Bakhshali method ............................................................................................................................. 15
Algorithm 3: Exponential identity .......................................................................................................................... 18
All graphs superimposed........................................................................................................................................ 20
Conclusion .............................................................................................................. 21
om
Reflection ................................................................................................................ 22
l.c
ai
Works cited ............................................................................................................. 23
gm
6@
39
r7
ad
ni
ha
y
tif
as
Cl
2
Plan of investigation
I have chosen this topic as it relates to my interests in physics and computer science.
Physics simulation software such as Simscale, MATLAB and BeamNG.Drive (which
is more of a game but still has physics that emulate real life) all operate on a type of
computer software known as “physics engines”. These physics engines apply real
life physics to a virtual world and objects of your choice. Thus, mathematical
(physics) formulas and arithmetic which are used in real life must be used in order to
calculate the movement and state of your virtual world / objects.
om
l.c
ai
This is where square roots (and algorithms for finding them) are essential; one very
gm
basic, but crucial example of the necessity for finding the square root in physics is
6@
39
vector addition. In vector addition, we represent the forces acting on an object using
r7
arrows which signify magnitude and direction. We “add” those arrows to form a
ad
ni
triangle with a missing side and then use functions of trigonometry such as the
ha
Pythagorean and cosine formulae (both of which require us to find square roots in
y
tif
as
order to get an answer) to calculate not only the magnitude (length) but also the
Cl
angle of the missing side which is the resultant / net force that is experienced by the
object. Of course, computers have no need to represent these forces as arrows and
triangles, especially when there are lots of objects and forces and no room for extra
data in memory, but the arithmetic is the same and thus an algorithm is needed in
order the calculate the square root for the computer to calculate a resultant force.
This is only one example out of many where we may require the square root of a
number in a physics engine. As with any piece of software, it is imperative that we
optimise the code for it to be as efficient as possible. In essence, we want our square
3
root algorithm to calculate our answer in the least amount of time possible so that we
can have more calculations per second, or so that our list of calculations can be
completed quicker.
Obtaining data:
First of all, to find out which algorithm is the most efficient I must first test every
single algorithm that I deem sensible enough to run (ones that by visual inspection of
the code or actual math do not appear to take too many calculations / operations for
om
them to be competitive). As a result of this, I must either code, or find existing code
l.c
ai
for these algorithms in the programming language Java, and then run each algorithm
gm
6@
multiple times with a variety of different numerical inputs and simultaneously run a
39
timer to find out which algorithm takes the least time on average to make a
r7
calculation. I will be using the NetBeans IDE and compiler to achieve this as it is the
ad
ni
program I am most familiar with from my computer science lessons. To facilitate this,
ha
y
I will write a method for each algorithm which will iterate through all the number
tif
as
inputs and write the time taken to a CSV spreadsheet file. In the case of algorithms
Cl
such as the Babylonian method, we may need to adapt certain variables such as 𝑥0 ,
our initial guess, to our domain. As we know, the domain for any square-root finding
algorithm is any real, positive number. For the sake of simplicity, I will restrict my
domain to all integers from 1 to 1000 inclusive. My rationale for this is that the most
efficient algorithm will be applied to a physics engine, and so the integers 1 to 1000
will cover vector additions from a range of 1 to 1000 newtons in magnitude. If we go
over 1000 newtons, then we can switch to a larger scale, kilonewtons (kN) and use
the domain 1 to 1000 again. Naturally, this will make our uncertainty much larger and
4
produce more errors in our physics simulation (our absolute uncertainty is multiplied
by 1000), however our physics simulation is basic and not to be used for exorbitantly
large scaled simulations. As to why we’re not using decimal numbers, I would like to
keep my domain short so that it is well within the hardware capabilities of my laptop.
As to the effect of using integers instead of decimal numbers (floats), there will be no
practical difference as I will denote my integer inputs as floats (decimal numbers) in
my Java programs so that they are handled the same exact way as decimal numbers
so that they carry out floating point arithmetic. In short, my use of integers will not
affect my data / trends in comparison to decimal inputs.
om
l.c
ai
gm
Algorithm 1: Babylonian method
6@
39
The Babylonian method is likely the first ever algorithm used for approximating √𝑆.
r7
ad
This however does not mean we should discount it. The earliest account of the
ni
Babylonian method in use was 1700 BC (Baez), on a clay tablet. From the artifacts
ha
y
that have been found, it is apparent that the Babylonians were commonly using this
tif
as
square root solving method for trigonometry, more specifically applying this method
Cl
to the Pythagorean identity in order to find the hypotenuse of a right-angle triangle.
The most likely reason for this need of trigonometry is architecture and construction.
What is interesting is that the Babylonians were commonly using integer inputs (in
this case triangle side lengths), which is exactly what we’re doing with our
experiment. This perhaps suggests that the Babylonian method may be well suited to
integer inputs. On the other hand, Babylonians did not have access to calculators, so
it’s likely they used integers solely to simplify the arithmetic, not because the method
is suited for integers. Nevertheless, it is an interesting method to consider.
5
The Babylonian method is an iterative algorithm (at least if the first iteration does not
achieve the desired accuracy, which in our case will never be). Here’s how the
method works. S is our number of which we want to find the square root. Let’s take 𝑥
𝑆
as guess for √𝑆. If our guess 𝑥, is an overestimate, then 𝑥 will be an underestimate
or vice versa. The average of these two numbers will provide a closer approximation
to √𝑆. We can this represent as such:
𝑆
+𝑥
√𝑆 ≈ 𝑥
om
2
l.c
Of course, doing this one step will not always produce a very accurate estimate. If we complete
ai
gm
this step, we will have an error, 𝜀 such that 𝑆 = (𝑥 + 𝜀)2 . If the error produced satisfies our set
6@
threshold, we can take the estimate for √𝑆 from this first step. If this not the case (which it will
39
never be in our case), we must begin the iterative process. By expanding 𝑆 = (𝑥 + 𝜀)2 and
r7
ad
solving for 𝜀 we get:

ni
ha
𝑆 = (𝑥 + 𝜀)2 = 𝑥 2 + 2𝜀𝑥 + 𝜀 2 ,
y
tif
𝑆 − 𝑥 2 = 2𝜀𝑥 + 𝜀 2 ,
as
Cl
𝑆 − 𝑥 2 = 𝜀(2𝑥 + 𝜀),
𝑆− 𝑥 2 𝑆− 𝑥 2
𝜀= ≈ ∵ 𝜀≪𝑥.
2𝑥+ 𝜀 2𝑥
With this in mind, we can come up with an improved, more accurate estimate as:
𝑆
𝑆−𝑥 2 𝑆+𝑥 2 +𝑥
𝑥+ 𝜀 ≈𝑥+ = = 𝑥
= 𝑥𝑟𝑒𝑣𝑖𝑠𝑒𝑑 .
2𝑥 2𝑥 2
If the new value of 𝜀 still doesn’t satisfy our threshold, we may repeat the step above over and
over until it does satisfy our threshold, each time taking 𝑥𝑟𝑒𝑣𝑖𝑠𝑒𝑑 and plugging it back into the
equation as 𝑥.
6
My rationale for this algorithm being efficient is the fact that it has simple operations
for a computer to perform (addition and squares (which is multiplication)), which can
be considered “cheap” operations (or rather, quick operations) with the exception of
division which is twice as long as multiplication (of which itself takes roughly 4 times
as long as addition/ subtraction) (Hindriksen). In addition to that, the Babylonian
method is quadratically convergent, meaning the number of correct digits of the
approximation almost doubles with every single iteration (“Methods of computing
square roots”).
om
l.c
However, the greatest strength of this algorithm has to do with the fact that it is well
ai 𝑆
gm
+𝑥
suited to a base 2 number system. If we take the equation 𝑥
= 𝑥𝑟𝑒𝑣𝑖𝑠𝑒𝑑 , which is
2
6@
repeated every iteration, we’ll notice that there are only two division operations that
39
r7
are happening. Of these two division operations, one of them is to simply divide the
ad
numerator of the entire fraction by 2. So, what is the significance of this number 2?
ni
ha
Well, computers use binary which is a base 2 number system. This means every
y
tif
digit of a binary number can represent two numbers as the value can either be 0 or
as
Cl
1. To simplify the understanding of the binary system, here is an example of a value
table:
(“Binary”)
7
If we were to add an extra digit to the right of the binary number (with the value 0),
we would the double the value of every other digit and essentially double the
number. So, in essence, shifting every digit one place to the left will double the
number. If we shift every digit one place to the right instead, the opposite will
happen. We will instead halve the number. If you pay attention, moving every digit to
the right in this case will cause the rightmost digit (1) to disappear, making it a half of
the previous value, rounded to the nearest integer. We don’t need to worry about this
as the computer will handle the situation using floating point arithmetic (this problem
would only occur with integer arithmetic). All of this goes to show that multiplying by
om
l.c
2, and more importantly in our case, dividing by 2 are two very quick operations that
ai
take the computer very little time due to it just being a matter of shifting all digits one
gm
6@
𝑆
+𝑥
place. These operations are so efficient that in the case of 𝑥
= 𝑥𝑟𝑒𝑣𝑖𝑠𝑒𝑑 , we can
2
39
r7
treat the equation as only having one division operation in terms of processing time.
ad
With this perspective, the Babylonian algorithm only has a singular, heavily
ni
ha
processor-taxing operation per iteration. This sounds very promising in terms of

y
tif
algorithm efficiency.
as
Cl
Algorithm 2: Bakhshali method
The Bakhshali method is an ancient Indian square root solving method from a time
period between the 6th and 12th centuries (Bailey). Information regarding the
Bakhshali method is very scarce, and little is known about rationale / application of
such method. Scholar G. R. Kaye believed the “mathematical content was derivative
8
from Greek sources” (Bailey). Given this possibility, we may consider this method as
an improvement or a successor to previously existing methods.
The Bakhshali method is yet another iterative algorithm. Even though, by first
inspection of the formula the method may seen very taxing on the processor (by the
standards that were set in terms of the ‘cost’ of operations), it should be considered
as the Bakhshali method is also quartically convergent (as opposed to the
quadratically converging Babylonian method) and therefore equivalent to two
iterations of the aforementioned Babylonian method given the same initial guess
om
l.c
(Bailey). Yet again, we must make an initial guess, the closer it is to the actual
ai
gm
square root, the more accurate the results from the first and each subsequent
6@
iterations will be. √𝑆 = 𝑥0 , where 𝑥0 is our first guess. We must then iterate as
39
follows:
r7
ad
ni
ha
2
𝑎𝑛 2
𝑎𝑛
𝑥𝑛+1 = 𝑏𝑛 − = (𝑥𝑛 + 𝑎𝑛 ) − .
2𝑏𝑛 2(𝑥𝑛 + 𝑎𝑛 )
y
tif
as
Cl
Definitions:
2
𝑆−𝑥𝑛
𝑎𝑛 = ,
2𝑥𝑛
𝑏𝑛 = 𝑥𝑛 + 𝑎𝑛 .
(Think of 𝑥𝑛+1 as 𝑥𝑟𝑒𝑣𝑖𝑠𝑒𝑑 from the Babylonian method)
9
We can use this make a rational approximation to the square root. So long as 𝑥0 2 is
close to 𝑆, the first iteration of the Bakhshali method can be written, and simplified as
such (“Methods of computing square roots”):
𝑑 = 𝑆 − 𝑥0 2 ,
𝑑 𝑑2 8𝑥0 4 +8𝑥0 2 𝑑+ 𝑑 2 𝑥0 4 +6𝑥0 2 𝑆+ 𝑆 2 𝑥0 2 (𝑥0 2 +6𝑆)+ 𝑆 2

√𝑆 ≈ 𝑥0 + 2𝑥0
−
8𝑥0 3 +4𝑥0 𝑑
=
8𝑁3 +4𝑥0 𝑑
=
4𝑥0 3 +4𝑥0 𝑆
=
4𝑥0 (𝑥0 2 +𝑆)
.
As previously touched upon, my main rationale for the Bakhshali method being competitive is
that each iteration is worth two Babylonian iterations. Despite there being a lot more operations
om
per iteration, the lowered number of iterations needed for an accurate (per our error threshold)
l.c
ai
value of √𝑆 as a cause of quartic convergence may prove efficient later on.
gm
6@
39
Algorithm 3: Exponential identity

r7
ad
ni
ha
Pocket calculators commonly use exponential identities to calculate the square root
y
of number. This algorithm is particularly intriguing, as by a quick visual inspection it

tif
as
is quite hard to get a sense of whether or not it will be efficient or even competitive
Cl
amongst other methods. Considering the use of a natural logarithm, which is very
taxing on the processor, the algorithm may initially seem inefficient. However, we
must consider that the algorithm is non-iterative. Following the properties of
logarithms, we can find the identity for √𝑆 :
1
√𝑆 = 𝑆 2 ,
1
ln √𝑆 = ln 𝑆 2 ,
10
1
ln √𝑆 = ln 𝑆,
2
1
ln 𝑒 = 1, ln √𝑆 = ln 𝑆 ln 𝑒 2 ,
1
ln √𝑆 = ln 𝑒 2 ln 𝑆 ,
1
ln 𝑆
√𝑆 = 𝑒 2 .
The inherit problem with this formula is that the efficiency of our identity is dependant
on the efficiency of our logarithm and ‘raising an exponent’ methods. (You may refer
om
l.c
to source code of math java library to find these respective methods). The methods I
ai
gm
will use come from the StrictMath java library (Blake), a library that is very commonly
6@
and extensively used by Java programmers. Since this method is extensively used,
39
I’m assuming that it’s the most efficient method available.

r7
ad
ni
ha
My rationale for this method being efficient mostly comes down to the fact that is
y
non-iterative. Whilst ‘raising an exponent’ and logarithms are extremely taxing on the
tif
as
processor, they only have to be done once. Due to this, I feel that the exponential
Cl
identity may not prove the most efficient with smaller inputs to begin with as the other
methods will require few iterations for smaller numbers. However, when it comes to
larger inputs, I feel that the exponential identity method will excel in terms of
efficiency as the other methods will require a lot of iterations to reach an accurate
result. My second reason for thinking this algorithm is efficient is simply down to
contextual information. This algorithm is already being used in pocket calculators,
devices which operate in binary (base 2), this goes to suggests that not only is this
11
method efficient enough to be used commercially, it is also well suited to our
application (computers which run on binary).
Raw and processed results:
Using Java, I have iterated through each of the three methods for 1 ≥ 𝑆 ≥ 1000 until
reaching 12 digits of precision for each result (slightly less than what MATLAB uses,
which is 16 digits of precision (“Increase Precision of Numeric Calculations - MATLAB &
om
Simulink”)), and to achieve a high degree of accuracy in my mean average I have
l.c
ai
done 10 repeats per method. Looking through my results I found several anomalies
gm
6@
that needed to be removed so as to not affect my trendline that is semi-automatically

39
generated by Excel (type of trendline (e.g., polynomial, linear, exponential, etc…) is

r7
ad
selected by me, however the actual trendline and its respective equation is done
ni
automatically). As there were 3 x (1000 x 10) = 30000 results in total, I would have to
ha
y
use a process that would automatically eliminate these anomalies from my mean
tif
as
average, which is the column that I would be graphing for each method. To
Cl
accomplish this, I used the TRIMMEAN Excel function, which would systematically
eliminate outliers from an array of values to be used in my mean average. I used the
function to eliminate the top 20% (2), and the bottom 20% (2) values for each array.
This meant my mean average would be considerably less accurate as it would be
using an array of 6 results each time instead of 10, however this was my only good
option as it would be extremely time-consuming to look through 30000 results. I
couldn’t just iterate each method more times for more results and then apply the
TRIMMEAN function as my Excel on my laptop was already lagging (almost beyond
12
reasonable usability) with my current set of results. Finally, I did not record the
processing time for any of the methods where 𝑆 = 1, as a Java error would include
the time taken to compile the program (which is not part of the algorithm).
Algorithm 1: Babylonian method:
om
l.c
ai
gm
6@
39
r7
ad
ni
ha
y
tif
as
Cl
This the graph of processing time in nanoseconds vs S for the Babylonian algorithm.
I chose my trendline as a polynomial of order 3 as I could deduce two turning points
13
visually. It seems as though the results are a lot more accurate when 𝑆 < 350, as
after that the deviation from the trendline becomes a lot greater. I have modelled the
function for the processing time in nanoseconds given S for the domain { 𝑆 ∈ 𝕫 |0 < 𝑆
≤ 1000}, visible in the top right of the graph. Some of the visible outliers that you can
see on the graph were points S = 401, 629, 742, 864. I have re-run these inputs into
my method several times (in the order of 20-30 times) and eventually I found that
they do comply with the trend of the other points, it’s just that for some reason Java
doesn’t always display the correct execution time for these inputs which has resulted
in these outliers.
om
l.c
ai
gm
The total time taken for all 1000 inputs with this method is 0.373ms (3 s.f)
6@
39
r7
ad
ni
ha
y
tif
as
Cl
14
Algorithm 2: Bakhshali method:
om
l.c
ai
gm
6@
39
r7
ad
ni
ha
y
tif
as
Cl
The results of the Bakhshali method were quite peculiar. Due to the shape of the
graph, I decided to split the trendline in two, making it a piecewise function. The first
part of the graph where 𝑆 ≤ 288, exhibits a quite clear exponential relationship
between the processing time and S as is evident with the exponential growth.
However, and quite interestingly, when 𝑆 > 288, the graph exhibits a logarithmic
relationship between the time and S. I have modelled the first function for the
15
processing time in nanoseconds given S for the domain { 𝑆 ∈ 𝕫 |0 < 𝑆 ≤ 288}, visible
in the top right of the graph, the function that is at the top. I have also modelled the
second function for the processing time in nanoseconds given S for the domain { 𝑆 ∈
𝕫 |288 < 𝑆 ≤ 1000}, visible in the top right of the graph, the function that is at the
bottom. Immediately from visual inspection I can tell that the Bakhshali method will
be highly inefficient (for the large majority of the domain of 0 > 𝑆 ≥ 1000) relative to
the other two, as is evident by the scale of the y-axis (20 times larger than the
previous graph).
om
l.c
I found the changeover point, S = 289 to be particularly intriguing. Firstly, 289 is not a
ai
gm
power of 2, therefore we can’t attribute the change in trend to the change in size of
6@
the memory address which stores S. One that is interesting is that 289 is a perfect
39
square, being the square of 17. Again, 17 isn’t a power of two so we can’t attribute
r7
ad
the change in trend to any quirks of the base 2 number system. Perfect squares do
ni
ha
have particular significance in our root finding algorithms as they usually converge
y
very quickly iteration-wise. One possible explanation for this changeover in behavior
tif
as
may be that there is a sudden decrease in iterations to reach the desired accuracy.
Cl
To test this, I altered my Bakhshali method code to instead count the number of
iterations taken to give a result that falls within my error threshold. I calculated the
number of iterations for values of S 286 to 292 inclusive, which are in and around the
changeover point. The results were as follows:
S Number of iterations
286 32
287 32
16
288 33
289 1
290 2
291 2
292 3
As expected, the perfect square of 289 resulted in very quick convergence. This is to
be expected. However, what is not to be expected is the shockingly low amount of
iterations required for the next few integer inputs, all of which are imperfect squares.
om
l.c
I lack the knowledge to explain why this is the case, however these results affirm that
ai
gm
this change in trend is not just a program error but rather real.
6@
39
This discovery led me to notice a pattern, in that the bottom of the trails in this graph
r7
ad
were all perfect squares (which is to be expected). The intriguing part is that the
ni
ha
execution time for each following integer after a perfect square would increase
y
tif
linearly until there was another perfect square, in which case the execution time
as
would again reset back to a near-zero value. The gradient of the linear patterns
Cl
would increase with each trail prior to S = 289, leading to an overall exponential
trend. After the changeover, the opposite would happen: the gradient of the linear
patterns would decrease with each trail leading to a logarithmic overall trend. Whilst I
lack an explanation for this phenomenon, it is quite interesting. Still, the Bakhshali
method appears quite inefficient at first inspection.
The total time taken for all 1000 inputs with this method is 9.22ms (3 s.f).
17
Algorithm 3: Exponential method:
om
l.c
ai
gm
6@
39
r7
ad
ni
ha
y
tif
as
Cl
Here is the graph of the exponential method. The general shape of the graph seems
a lot closer to the expected results. I chose a polynomial of order 4 as my trendline
as I could see (but it is very hard to tell), 3 different turning points, one of which was
not meant to be < S = 200, however I could not control how the trendline was
generated. Judging visually by the scale of the y-axis, this method seems to be more
18
or less on par with the Babylonian method (especially when you consider that the
Babylonian graph had way greater, but very few, outliers which contribute to the
increase in scale). I have modelled the function for the processing time in
nanoseconds given S for the domain { 𝑆 ∈ 𝕫 |0 < 𝑆 ≤ 1000}, visible in the top right of
the graph.
The outliers in this graph are exactly like the ones in algorithm 1, meaning that they
do actually comply with the trend and are just the result of a Java error.
om
l.c
Perfect squares did not have an effect on execution time in this algorithm, this is to
ai
be expected due to the non-iterative nature of the exponential identity being used
gm
6@
and the lack of gradual convergence. There isn’t really much to be said about this
39
algorithm, I feel as though the results are too uncertain to deduce a solid trend.
r7
ad
However, it does seem that this method loosely exhibits linear correlation. However,
ni
our set of data is too small to confirm this.

ha
y
tif
as
The total time taken for all 1000 inputs with this method is 0.180ms (3 s.f)
Cl
19
All graphs superimposed:
om
l.c
ai
gm
6@
39
r7
ad
ni
ha
y
tif
Finally, I used Desmos to superimpose the graphs from each image into one to
as
Cl
graphically check which method is the most efficient. I did this by typing the equation
for each trendline into Desmos. To see which method is the most efficient we have to
check which line (or in this case lines) are the lowest in the graph. We can actually
see that the red line, the first part of the piecewise function for the Bakhshali method
is the most efficient from S = 0 to the point where it intersects with purple trend line
from the Babylon method, where S = 69.265. From S > 69.265, the exponential
identity method becomes the most effective one. The yellow trend line of the second
part of the piecewise function of Bakhshali intercepts the x-axis at point 295.0. It’s
briefly the most efficient method until it intersects with the trendline of the exponential
20
identity at point S = 298.46, where the exponential method becomes most efficient
yet again until the end of the domain (S ≤ 1000). It is important to note that we only
take integer inputs for S, and the points of intersection of the trendlines give us non-
integers. Hence, we should round every point of intersection to the previous whole
number, not the nearest. For example, there is an intersection at point S = 298.46 ,
meaning that the Bakhshali method is most efficient at S = 298 and the exponential
identity is most efficient at S = 299. Following this, the points of method changeover
are as such: S = 69, 295, 298.
om
l.c
ai
gm
Conclusion:
6@
39
From the processed results, we can graphically deduce that the Bakhshali algorithm
r7
is most efficient from 0 < S < 70, and 295 ≤ S < 299. On the other hand, the
ad
ni
exponential identity algorithm is most efficient from 70 ≤ S < 295 and from 299 ≤ S ≤
ha
1000. From these results, we should not consider the Babylonian algorithm in our
y
tif
as
physics engine, as it is never the most efficient method. We should aim to use the
Cl
exponential identity and Bakhshali algorithms in their respective domains where they
are the most efficient methods. This can be achieved by using if statements to check
whether the inputs of S fall into these domains, and then using the best method to
calculate their square roots (e.g., if S < 70 → Bakhshali(S) ). If statements are
relatively quickly operations, however, given the small domain where the Bakhshali
method is most efficient, the time taken for an if statement, or rather multiple if
statements every single time, would increase the processing time every time,
unnecessarily more than just sticking to a single method. For this reason, I have
21
chosen the exponential identity algorithm as the single algorithm to use for solving
square roots as its domain is the largest. To further support this choice, the
exponential identity has a total execution time of 0.180ms (3 s.f) for all 1000 integer
inputs. This is less than half of the total execution time for all 1000 integer inputs with
the Babylonian method, 0.373ms (3 s.f). Compared to the Bakhshali method
however, the difference is even more drastic, with a total execution time of 9.22ms (3
s.f). By this metric of efficiency, the exponential identity comes out as the most
efficient method by quite a margin (0.193ms).
om
l.c
Reflection:
ai
gm
Overall, this experiment was quite unsuccessful. To start with, some of the results for all
6@
39
methods would seldom give huge uncertainties such as ± 200ns when the uncertainties should
r7
be 0. This is because each method will mathematically always take a set number of steps
ad
ni
(operations per iteration x iterations) to reach an approximation of √𝑆 within our error threshold,
ha
y
meaning that given the static clock rate of my processor, should take the exact same time,
tif
as
every time for a particular input. This is not the case in my results as is evident by the
Cl
uncertainties I obtained. This tells me that there are lots of ‘hidden’ variables that need to be
controlled in my experiment, a task that is nigh-on-impossible. Factors such as the caching of
my program mid-execution are impossible to control and Java itself has some oddities in
execution which yet again I can’t control.
If I were to answer this question again, I would test actual practical application. Testing each
method in an actual physics engine, running lots of simulations and deducing which method has
the least total processing time / lowest mean processing time would be the best approach. This
22
unfortunately isn’t possible for me as I haven’t designed a physics engine (it’s quite
complicated).
The second best thing to do would be to use a larger domain for S, continuing into the millions. I
found that my rather small set of data was insufficient to deduce a solid trend. This is also the
reason that I have deliberately avoided extrapolation in my analysis as a trend was very difficult
to notice in all of the methods.
Unfortunately, my IA topic leaves me with very little theoretical information to go off. It is
extremely difficult to theoretically calculate the execution time required for a square root method
om
l.c
as there are thousands of operations (which aren’t always equal to each other in terms of CPU
ai
gm
time) to keep track of. This leaves me with nothing to compare my practical results with.
6@
39
r7
ad
Works Cited:
ni
ha
Baez, John. “Babylon and the Square Root of 2 | Azimuth.” Azimuth, 2 December 2011,
y
tif
as
https://johncarlosbaez.wordpress.com/2011/12/02/babylon-and-the-square-root-of-2/. Accessed 15 June

Cl
2022.
Bailey, David H. “Ancient Indian Square Roots: An Exercise in Forensic Paleo-Mathematics.” David H Bailey,
https://www.davidhbailey.com/dhbpapers/india-sqrt.pdf. Accessed 15 June 2022.
Blake, Eric. “Source for java.lang.StrictMath (GNU Classpath 0.95 Documentation).” developer.classpath.org!,
https://developer.classpath.org/doc/java/lang/StrictMath-source.html. Accessed 15 June 2022.
Hindriksen, Vincent. “How expensive is an operation on a CPU? - StreamHPC.” Stream HPC, 16 July 2012,
https://streamhpc.com/blog/2012-07-16/how-expensive-is-an-operation-on-a-cpu/. Accessed 15 June
2022.
23
“Methods of computing square roots.” Wikipedia,
https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method. Accessed 15
June 2022.
“Binary.” japanistry.com, https://www.japanistry.com/binary/. Accessed 15 June 2022.
“Increase Precision of Numeric Calculations - MATLAB & Simulink.” MathWorks,
https://www.mathworks.com/help/symbolic/increase-precision-of-numeric-calculations.html. Accessed 15
June 2022.
om
l.c
ai
gm
6@
39
r7
ad
ni
ha
y
tif
as
Cl
24

Finding The Most Efficient Square Root Algorithm

Uploaded by

Copyright:

Available Formats

You might also like

Finding The Most Efficient Square Root Algorithm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Finding The Most Efficient Square Root Algorithm

Uploaded by

Copyright:

Available Formats

Downloaded from www.clastify.

com by Hania Elgawsaky

Finding the most efficient square root algorithm

Physics simulation software such as Simscale, MATLAB and BeamNG.Drive (which

calculate the movement and state of your virtual world / objects.

number in a physics engine. As with any piece of software, it is imperative that we

will cover vector additions from a range of 1 to 1000 newtons in magnitude. If we go

practical difference as I will denote my integer inputs as floats (decimal numbers) in

affect my data / trends in comparison to decimal inputs.

to the Pythagorean identity in order to find the hypotenuse of a right-angle triangle.

is suited for integers. Nevertheless, it is an interesting method to consider.

to √𝑆. We can this represent as such:

solving for 𝜀 we get:

as long as addition/ subtraction) (Hindriksen). In addition to that, the Babylonian

method is quadratically convergent, meaning the number of correct digits of the

approximation almost doubles with every single iteration (“Methods of computing

1. To simplify the understanding of the binary system, here is an example of a value

processor-taxing operation per iteration. This sounds very promising in terms of

Algorithm 2: Bakhshali method

an improvement or a successor to previously existing methods.

as the Bakhshali method is also quartically convergent (as opposed to the

quadratically converging Babylonian method) and therefore equivalent to two

(Think of 𝑥𝑛+1 as 𝑥𝑟𝑒𝑣𝑖𝑠𝑒𝑑 from the Babylonian method)

such (“Methods of computing square roots”):

𝑑 𝑑2 8𝑥0 4 +8𝑥0 2 𝑑+ 𝑑 2 𝑥0 4 +6𝑥0 2 𝑆+ 𝑆 2 𝑥0 2 (𝑥0 2 +6𝑆)+ 𝑆 2

Algorithm 3: Exponential identity

of number. This algorithm is particularly intriguing, as by a quick visual inspection it

must consider that the algorithm is non-iterative. Following the properties of

logarithms, we can find the identity for √𝑆 :

I’m assuming that it’s the most efficient method available.

contextual information. This algorithm is already being used in pocket calculators,

method efficient enough to be used commercially, it is also well suited to our

application (computers which run on binary).

Raw and processed results:

which is 16 digits of precision (“Increase Precision of Numeric Calculations - MATLAB &

that needed to be removed so as to not affect my trendline that is semi-automatically

generated by Excel (type of trendline (e.g., polynomial, linear, exponential, etc…) is

This meant my mean average would be considerably less accurate as it would be

option as it would be extremely time-consuming to look through 30000 results. I

TRIMMEAN function as my Excel on my laptop was already lagging (almost beyond

Algorithm 1: Babylonian method:

I chose my trendline as a polynomial of order 3 as I could deduce two turning points

Algorithm 2: Bakhshali method:

changeover point. The results were as follows:

be expected. However, what is not to be expected is the shockingly low amount of

method appears quite inefficient at first inspection.

Algorithm 3: Exponential method:

a lot closer to the expected results. I chose a polynomial of order 4 as my trendline

our set of data is too small to confirm this.

All graphs superimposed:

are as such: S = 69, 295, 298.

calculate their square roots (e.g., if S < 70 → Bakhshali(S) ). If statements are

the Babylonian method, 0.373ms (3 s.f). Compared to the Bakhshali method

efficient method by quite a margin (0.193ms).

controlled in my experiment, a task that is nigh-on-impossible. Factors such as the caching of

execution which yet again I can’t control.

to notice in all of the methods.

Unfortunately, my IA topic leaves me with very little theoretical information to go off. It is