Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Introduction

This section serves to introduce you to the basics of Microsoft Excel. As a spreadsheet, you will learn to manipulate data in which each data
entry is in a cell. Please download the Excel workbook as we will be referencing this throughout these slides.

In every workbook, there are worksheets and you can select a worksheet by clicking on the tab at the bottom of each worksheet. Each
section in these slides will reference a particular sheet.
• Sheet 1 – Basic Excel operations we will encounter like how to sum, add, multiply.
• Sheet 2 and 3 – Regression
• Sheet 4 – Newton and Secant Method
• Sheet 5 – Trapezoidal Rule
• Sheet 6 – Euler’s Method

You are strongly urged to view the video: Excel for Beginners - The Complete Course – YouTube first which will give you a more detailed
overview about how to create a spreadsheet from scratch.

In spite being an old software, it is still widely used because a lot of data files (from sensors and devices) are in a format which can be easily
manipulated by Microsoft Excel – whether it is scientific, engineering or financial data.
1
1. Basic Excel Operations
Referencing Worksheet 1, suppose we are given a table of 40 student scores (maximum is 10 and minimum is 0), we want to know the
following: (a) average or mean of these scores, (b) standard deviation of these scores, (c) median of these scores, (d) maximum score and (e)
minimum score.

You will learn what this means in more detail in Term 3 but for now, the mean is just the average (sum of all the scores divided by the
number of scores).

The standard deviation measures how the data is spread out with respect to the mean.

The median is the middle number after all the data is sorted according to its size.

Opening up Sheet 1, we see that the student scores are located in column A and rows 1 to 40. You may scroll down to confirm this. Each cell
is identified by its column name first and then its row number. Note the column name and the row numbers in which your data is located.

Cells A10, A11 and A12 has the following numbers in it: 4, 5 and 7 respectively. We can also numerically sort the data (if this is required for
presentation).
2
1. Basic Excel Operations

The result can be seen in column B, rows 1 to 40. Unless you are required to present a sorted version of the data, you do not have to do this.
This sort function can sort alphabetically and numerically.

Excel automatically picks up the format based on the nature of the data entry and will sort accordingly.

Short of scrolling down to confirm the number of data entries we have, we can use the formula =count(a1:a2000000). The use of an
insanely large number (2 million) ensures that we can capture all the data.

Do not worry about the empty cells because Excel knows how to distinguish between 0 and an empty cell. This formula is situated in cell E3.
So unless you have more than a million data entries, you can just use 2 million. Anything more than this and you will probably want to
consider using another software.
3
1. Basic Excel Operations
So we now proceed to find the average, standard deviation, mean, maximum and minimum of the given data set.

(a) Average of the data is given by the formula =average(a1:a2000000). You must put the equal sign or Excel will see this as text. You can
see this formula in cell E3 (column E, row 3).

(b) The standard deviation of the data set is given by the formula =stdev(a1:a2000000). The formula is given in cell E5 (column E, row 5).

(c) The median is the middle value of the sorted data. There is no need for you to sort first. The formula for the median is given by
=median(a1:a2000000) and is situated in cell E7 (column E, row 7). Since we have 40 data entries, you can confirm this by seeing that
the result shown is the same as the score found in cell A20.

(d) The maximum score is given by the formula =max(a1:a2000000) and is situated in cell E9 (column E, row 9).

(e) The minimum score is given by the formula =min(a1:a2000000) and is situated in cell E11 (column E, row 11).

4
1. Basic Excel Operations
The advantage of using such a large number (2 million) is to help you to avoid scrolling down to check the number of data entries you have.
The formulae used is intended to help you to automate the process by copying and pasting the formula shown.

Note when writing the formula, there is no need to use the capital letters.

As you can see, Excel is useful because one does not need to calculate things manually.

Whatever is the data set, just note the location (column name and row numbers) and apply the formula as shown above and in the
worksheet. The choice of where to situate the formula is based on aesthetic considerations – what is easier on the eye. So give yourself
sufficient space.

In this case, the data is given as scores and it is perhaps useful to consider a histogram (or bar chart). The following are the steps:
i. Select the data. Since our data is situated in column A, drag the cursor to the top of column A and click.
ii. Click Insert and you should see the recommended charts. Go to the format of interest and click on the histogram.
iii. The histogram automatically appears. To position it, go to the top of the graphic and click and hold that graphic and move your mouse.
iv. For the title, drag your cursor to where you see Chart Title, click it and type accordingly.
5
1. Basic Excel Operations
In the present format, you see the categories of 2 to 4.3, 4.3 to 6.6, 6.6 to 8.9 and 8.9 to 11.2 are given according to the value of the
standard deviation.

Should we want to change the categories to say 2 to 4, 4 to 6, 6 to 8 and 8 to 10, click on any column in the histogram and you should see a
Format Data Series emerge on the right.
• Go to Series Options and select horizontal axis.
• You will see that the Automatic option is selected. Go to the Bin Width and select 2 (since the gap we have chosen is 2).

And you will see that the chart is automatically updated.

In the next section, we are going to show how we can use Excel to perform polynomial interpolation and regression. These topics will be
explored more deeply in Probability and Statistics in Term 3. So for now, we will focus on how to do this.

But what is the difference between them? Interpolation aims to find a function that passes through all the data points exactly, while
regression aims to find a function that minimizes the error between the data points and the function.
6
2. Regression
Referencing Worksheet 2, we will talk about how to fit a polynomial to a given set of data. We shall do this using Example 4 of Chapter 4 of
the notes. The table gives the velocity data for the shuttle between
liftoff and the jettisoning of the solid rocket boosters.

We want to find an appropriate third order polynomial that exactly


describes the relationship between the velocity and the time. In
the notes, MATLAB code was used and now we want to use Excel.

In the first column (column A, rows 1-8), we have values for time
and in the second column (column B, rows 1-8), we have
velocity values. Select the two columns A and B, go to Insert and choose the scatter plot. The icon looks like this:

A graphic will then pop up. Now click on the “+” icon and drag icon to Trendline and click the “>” icon. Further select More Options. Under
Trendline Options, choose polynomial (in this case) and choose 3 for the third order polynomial. Scroll further down and select Display
Equation on Chart. We see that the 3rd order polynomial is 𝑦 = 0.0015𝑥3 − 0.1155𝑥2 + 24.982𝑥 − 21.269 and we see that we get the
exact same answer as what was featured in the notes.
7
2. Regression
Referencing Worksheet 3, we will talk about how to fit an exponential function to a given set of data. Here, we are given data for methane
hydrate: temperature (in both degrees Kelvin) vs the atmospheric pressure (given in atmospheres). Both these data describe the phase
diagram for this solid form of natural gas. 𝑇 (°𝐾) 𝑃 (𝑎𝑡𝑚)
273.7 27
It can be shown that the vapor pressure varies with the absolute temperature according to the Clausius 280.9 58
-Clapeyron equation: 285.9 97
1 286.5 105
ln 𝑃 = −𝐴 +𝐵
𝑇 286.7 107
where 𝐴 and 𝐵 are constants to be determined through regression. Knowing this, we can prepare the 290.2 157
data to better fit the data to the equation above. 295.7 335
301 640
So we create a separate column (see column D) containing 1/𝑇 and another column containing ln(𝑃) 301.6 645
(see column E). The way this is done is as follows: 302 765
• in the cell D1, enter the formula =1/A1 and in cell E1, enter the formula = ln(𝐵1). 315.1 2344
• select cells D1 and E1, drag your cursor to the bottom right hand corner of the selection (you will 320.1 3918
see a “.”), click and hold and drag down to row 12. Immediately, you will see the two columns.
8
2. Regression
Now we can apply regression here because in reality most data has noise. This noise occurs for a variety of reasons such as instrument
calibration issues or simply schoolboy/schoolgirl errors due to misreading instruments.

And regression aims to minimize the error between the data points and the function which we say describes the relationship. Again, we will
not go through the mathematical derivation here because they require multivariable calculus (Term 2 Maths).

But we can still exploit built-in Excel functions. We recall that for a straight line, we need to know the slope and the 𝑦-intercept (i.e. the
value along the vertical axis where our line of interest cuts this vertical axis).

In cell E21, we want the slope of the straight line and enter the formula =slope(e1:e12,d1:d12). In cell E22, we want the 𝑦-intercept of the
straight line and we enter the formula =intercept(e1:e12,d1:d12).

For the nerds out there (like Bernie), we can use another Excel function called RSQ which gives the Pearson correlation coefficient.

The correlation coefficient ranges from −1 to 1. In cell E23, we enter the formula =rsq(e1:e12,d1:d12). An absolute value of exactly 1 implies
that a linear equation describes the relationship between 𝑋 and 𝑌 perfectly, with all data points lying on a line.
9
2. Regression
The correlation sign is determined by the regression slope: a value of +1 implies that all data points lie on a line for which 𝑌 increases
as 𝑋 increases, and vice versa for −1. A value of 0 implies that there is no linear dependency between the variables.

The result in cell E21 gives us the value of 𝐴 while the result in cell E22 gives us the value of 𝐵. 𝐴 = −9710.8 and 𝐵 = 38.63. The result in
cell E23 tells us that the data fits the straight line pretty well (and that is a relief).

We can also use the technique illustrated in slide 7 and we obtain exactly the same result.

The graphic is shown in that worksheet.

The dotted line you see is the equation of the line we obtained with the 𝐴 and 𝐵 values from the previous paragraph.

10
3. Newton’s Method and Secant Method
Suppose we want to find the solutions to the equation 𝑓 𝑥 = 0.

Aside from linear and quadratic equations, most equations don’t have simple formulas for their roots.

Many calculators have numerical rootfinders that enable us to find approximate roots of equations, though they need to be used with care.

How do those numerical rootfinders work? They use a variety of methods, but most of them make some use of Newton’s method, also
called the Newton-Raphson method.

We will explain how this method works. In general, the way we obtain the solution via Newton’s Method is through an iterative process.

An iterative process means that we will have an initial value to kickstart it (we call it 𝑥1 , where the subscript 1 is the index associated with
the first value in the iteration) and then obtain a series of values (𝑥2 , 𝑥3 , ⋯) which will (hopefully) get closer to the value we are looking for.

So how does Newton’s method works? Below is a step by step guide of how this method works.
i. Start with an initial value of 𝑥, called 𝑥1 . The assumption is that 𝑥1 is not the answer!
11
3. Newton’s Method and Secant Method
So how does Newton’s method works? Below is a step by step guide of how this method works.
ii. Obtain the linearization of the function at 𝑥 = 𝑥1 . That is, 𝐿 𝑥 = 𝑓 𝑥1 + 𝑓 ′ 𝑥1 ⋅ 𝑥 − 𝑥1 .
iii. Obtain the value of 𝑥 when 𝐿 𝑥 = 0 (this is when the tangent line at 𝑥1 cuts the 𝑥-axis).

𝑓 𝑥1
𝐿 𝑥 = 𝑓 𝑥1 + 𝑓 ′ 𝑥1 ⋅ 𝑥 − 𝑥1 = 0 → 𝑥=− + 𝑥1 , 𝑓 ′ 𝑥1 ≠ 0
𝑓 ′ 𝑥1
iv. Specify a tolerance say 𝑡𝑜𝑙 = 10−6 and check if |𝑓(𝑥)| ≤ 𝑡𝑜𝑙.
v. If 𝑓 𝑥 > 𝑡𝑜𝑙, then repeat steps ii, iii and iv. Keep track of the values of 𝑥 used, i.e. 𝑥1 , 𝑥2 , ⋯

The method can be visualized via the two graphs on the right. Mathematically, the above steps will
look like this.

𝑓 𝑥𝑘
𝑥𝑘+1 = 𝑥𝑘 − , 𝑓 ′ 𝑥𝑘 ≠ 0, 𝑘 = 1, 2, 3, ⋯
𝑓 ′ 𝑥𝑘
The iterative process is to proceed from 𝑘 = 1,2,3, ⋯ until |𝑓(𝑥)| is within a specified tolerance.
There are no hard and fast rules as to what this tolerance should be. Practically, it all depends on
the level of precision required. In each step of the iterative process, we require that 𝑓 ′ 𝑥𝑘 ≠ 0.

12
3. Newton’s Method and Secant Method
It is an important method that you would need to know how to implement in Machine Learning and many other contexts where finding the
roots to an equation is needed.

Referring to Worksheet 4, we want to find correct to 6 significant figures, the root of the equation cos(𝑥) = 𝑥. Because we are not dealing
with a data set, we can do a little aesthetics to make the worksheet look more presentable.

In this example, we will show how this can be implemented in Excel. Sometimes, the initial value 𝑥1 can be guessed by looking at the
intersection of the two functions.

The required function is 𝑓(𝑥) = cos(𝑥) − 𝑥 though 𝑥 − cos(𝑥) is acceptable too. The iterative
formula for Newton’s Method is

𝑓 𝑥𝑘 cos 𝑥𝑘 − 𝑥𝑘
𝑥𝑘+1 = 𝑥𝑘 − = 𝑥𝑘 −
𝑓 ′ 𝑥𝑘 − sin 𝑥𝑘 − 1
Looking at the intersection of the two curves (see figure on the right), we see that choosing 𝑥1 = 1 is reasonable because it is sufficiently
close to the intersection (our goal). This, along with requiring that 𝑓 ′ 𝑥𝑘 ≠ 0 are the key limitations behind the Newton’s method.
Nevertheless, it is an important technique that serves as the basis of numerous optimization methods.
13
3. Newton’s Method and Secant Method
We will need to do the following for Newton’s method:
• label 𝑥, 𝑓 and 𝑓′ in cells A1, B1 and C1.
• in cell A2, we enter the initial guess. This is why 1 is there.
• in cell B2, we enter the value of 𝑓 (in this example it is cos(𝑥) − 𝑥) at the initial guess. That is, enter the formula =cos(a2)-a2
• in cell C2, we enter the value of 𝑓′ (derivative of 𝑓 = −sin(𝑥) − 1) evaluated at the initial guess. That is, enter the formula =-sin(a2)-1
• in cell A3, we enter the formula for Newton’s Method: =A2-(B2)/(C2)
• select cells B2 and C2, drag your cursor to the bottom right hand corner of the selection (you will see a “.”), click and hold and drag down
to row 3.
• select cells A3, B3 and C3, drag your cursor to the bottom right hand corner (you will see a “.”), click and hold and drag down until you
see that the bottom-most cell in column B is close to the tolerance or less.

And we observe that the convergence to the solution (𝑓 = 0) is achieved after 3 steps (with tolerance of 10−6 ).

The secant method is a variant of Newton's method that approximates the derivative of 𝑓(𝑥) by approximating the tangent line to 𝑓 at 𝑥 by
a secant line for 𝑓 that passes through 𝑥1 . The secant method can be interpreted as one which the derivative (in Newton’s method) is
replaced by an approximation. The formula is shown in the next slide.
14
3. Newton’s Method and Secant Method

To use the secant method, we require two initial guesses: say 𝑥1 = 1 and 𝑥2 = 0.

We will need to do the following for Secant method:


• label 𝑥 and 𝑓 in cells G1 and H1.
• in cell G2, we enter the first initial guess. This is why 1 is there.
• in cell G3, we enter the second initial guess. If we used 0, then you will see 0 there.
• in cell H2, we enter the value of 𝑓 (in this example it is cos(𝑥) − 𝑥) at the initial guess. That is, enter the formula =cos(g2)-g2
• select cell H2, drag your cursor to the bottom right hand corner (you will see a “.”), click and hold and drag down to row 3.
• in cell G4, we enter the formula for Secant’s method. That is, we enter =g3-h3*(g3-g2)/(h3-h2)
• select cells G4 and H4, drag your cursor to the bottom right hand corner (you will see a “.”), click and hold and drag down until you see
that the bottom-most cell in column H is close to the tolerance or less.

And we observe that the convergence to the solution (𝑓 = 0) is achieved after 3 steps (with tolerance of 10−6 ).
15
4. Trapezoidal Rule
We saw in Chapter 5 of the notes how to obtain the area under a curve. The formula presented is to divide the region of interest into
rectangles and then sum the area of the rectangles.

The trapezoidal rule is one where we divide the area under the curve into a series of trapezoids instead of rectangles.

In the figure on the right, we will take a crude approximation to illustrate the technique.

Using one trapezoid, the area under the curve is approximated by

If we have more trapezoids (say N trapezoids), then the area under the curve is approximated by

16
4. Trapezoidal Method
2
Let us find the area under the curve 𝑦 = 𝑒 −𝑥 , 1 ≤ 𝑥 ≤ 2 using the Trapezoidal rule. We need to do the following:
2−1 2−1
• choose the number of sub-intervals between 𝑥 = 1 to 𝑥 = 2. Say 𝑁 = 10. Then Δ𝑥 = = = 0.1
𝑁 10

• in cells A1 and B1, we add the labels x and f to denote the values of 𝑥 and the corresponding values of the function.
• in cell A2, we enter the value of 1 since this is the left-most value of 𝑥 in the domain [1,2].
• In cell A3, we add Δ𝑥 to the previous cell value. That is, we enter the formula =a2+0.1
• select cell B2, drag your cursor to the bottom right hand corner of the selection (you will see a “.”), click and hold and drag down to row
3.
• select cells A3 and B3, drag your cursor to the bottom right hand corner of the selection (you will see a “.”), click and hold and drag down
to row 12. Row 12 because we have 10 trapezoids.
• in cell G6, we can use the formula for the summation below and enter =0.1/2*(b2+b12)+0.1*SUM(b3:b11)

And we see that we can obtain the value of 0.13581. When N=100, we obtain the value of 0.135263, where we will have Δ𝑥 = 0.01 and
have values from row 2 to row 102. Try to see what happens when we have 𝑁 = 1000.

17
5. Euler’s Method
In the last section of Chapter 6, we look at solving a first order differential equation.

𝑑𝑦
= 𝑓(𝑦, 𝑡)
𝑑𝑡

You will soon see that we can use a technique called separation of variables to solve such equations. As the term implies, we separate the
terms in 𝑦 and 𝑡. Consider a relatively simple problem of Newton’s law of cooling and the corresponding differential equation is given by

𝑑𝑇
= 𝑘(𝐸 − 𝑇)
𝑑𝑡
where 𝑇 is the temperature as a function of time and 𝐸 is the ambient temperature. 𝑇 0 = 𝑇0 is the initial condition of the problem. By
initial condition, we mean that we are presented the temperature at the beginning of the experiment. Let us illustrate how to obtain the
solution by the method of separation of variables.

𝑑𝑇 𝑑𝑇
= 𝑘 𝐸 − 𝑇 = −𝑘 𝑇 − 𝐸 → = −𝑘𝑑𝑡 → ln 𝑇 − 𝐸 = −𝑘𝑡 + 𝐶 → 𝑇 − 𝐸 = 𝐴𝑒 −𝑘𝑡 → 𝑇 = 𝐸 + 𝐴𝑒 −𝑘𝑡
𝑑𝑡 𝑇−𝐸

𝑇 0 = 𝑇0 = 𝐸 + 𝐴 → 𝐴 = 𝑇0 − 𝐸

→ 𝑇 𝑡 = 𝐸 + 𝑇0 − 𝐸 𝑒 −𝑘𝑡 18
5. Euler’s Method
Now let us assume that we want to solve this same problem using the Euler’s method.

𝑑𝑇
=𝑘 𝐸−𝑇 , 𝑇 0 = 𝑇0
𝑑𝑡
𝑎−0 𝑎
We divide the time interval from 𝑡 = 0 to 𝑡 = 𝑎 into 𝑁 sub-intervals, where 𝑡 = 0 is the initial time and Δ𝑡 = = .
𝑁 𝑁

So the values of time will then be


𝑎 𝑎 𝑎 𝑎 𝑎 𝑎 𝑎
𝑡0 = 0, 𝑡1 = 𝑡0 + = , 𝑡2 = 𝑡1 + =2 , 𝑡3 = 𝑡2 + =3 ,⋯ , 𝑡𝑁 = 𝑁 =𝑎
𝑁 𝑁 𝑁 𝑁 𝑁 𝑁 𝑁
The corresponding temperatures are then 𝑇0 , 𝑇1 , 𝑇2 , ⋯ 𝑇𝑁 . So how do we calculate 𝑇1 and the other values?

In essence, Euler’s method starts with an initial condition and use the approximation for the derivative and solve for 𝑇 in the next time
steps. And we continue this iterative process till we reach our desired end-time.

𝑑𝑇 𝑇𝑘+1 − 𝑇𝑘
≈ = 𝑘 𝐸 − 𝑇𝑘 , 𝑘 = 0, 1, 2, ⋯ , 𝑁
𝑑𝑡 Δ𝑡
→ 𝑇𝑘+1 = 𝑇𝑘 + Δ𝑡 ⋅ 𝑘 𝐸 − 𝑇𝑘
19
5. Euler’s Method
Referring to Worksheet 6, if we have t 0 = 0, 𝑇0 = 0, 𝑘 = 0.2, 𝐸 = 10, then if we want to determine the temperature at 𝑡 = 10 and
choose to have 𝑁 = 10.

10 − 0
Δ𝑡 = = 1 → 𝑇𝑘+1 = 𝑇𝑘 + Δ𝑡 ⋅ 𝑘 𝐸 − 𝑇𝑘 = 𝑇𝑘 + 0.2 1 10 − 𝑇𝑘
10
→ 𝑇𝑘+1 = 𝑇𝑘 + 0.2 10 − 𝑇𝑘
We can implement this on Excel as follows:
• label time t and temperature T in cells A1 and B1 respectively.
• enter the initial time and initial temperature in cells A2 and B2 respectively. Here, we enter 0 in both these cells.
• in cell A3, we enter the next time value and enter the formula =a2+1
• in cell B3, we enter the next time value and enter the formula =b2+0.2*(10-b2)
• select the cells B3 and C3 and drag down to row where the cell in column A reaches 10 (that is when 𝑡 = 10). This will be in row 12. The
corresponding temperature will show 8.926258. This means that with Δ𝑡 = 1, 𝑇 10 ≈ 8.926258. This value of Δ𝑡 is admittedly too
crude. The smaller the value, the better.

If we have 𝑁 = 20, then 𝑇 10 ≈ 8.784233. For 𝑁 = 50, then 𝑇 10 ≈ 8.701142. You can experiment and see what happens when we
use different values of 𝑇(0) and 𝑁.
20

You might also like