Professional Documents
Culture Documents
Fem Notes
Fem Notes
net/publication/270793552
CITATION READS
1 1,964
1 author:
David Nordsletten
University of Michigan
123 PUBLICATIONS 1,598 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by David Nordsletten on 13 January 2015.
AUTHOR
Dr David Nordsletten
FOREWORD
The finite element method constitutes a key computational tool for engineers
to better understand and analyze complex systems. The method mentioned
in a paper by Courant in 1943 was the focus of intense research in classical
engineering disciplines throughout the 1950s-1970s to improve the design and
manufacture of everything from bridges to airplanes to cars (for more on the
history of finite elements, see J. Tinsley Ogden’s review). Since, the method has
continued to be the focus of research as well as an important tool for engineers
and engineering researchers.
There are a number of excellent texts which detail both the application and theory
of finite elements. Books by Hughes, Bathe, Gresho and Zienkiewicz provide
reviews of finite element methods for engineering, while texts by Quarteroni and
Valli, Ciarlet, Brezzi and Fortin, and Girault and Raviart provide comprehensive
reviews of the mathematical theory underpinning the method. The intention
of these lecture notes is not to duplicate these works, but instead provide an
introductory understanding of both theory and application to enable students to
use and refer to these texts. As a result, we these introductory concepts at a level
which avoids some complexity which may detract from the intuitive principles
of the finite element method.
Descriptions of interpolation schemes and discrete representation of functions
in these notes have benefitted from content given from the FEM/BEM Notes
written by researchers at the University of Auckland over the last 25 years
(available online).
In this section you will be introduced to some of the basic concepts in Functional Analysis.
While teaching this material rigorously represents a whole course worth of material, here
we instead aim to introduce those aspects of the theory which are important for our later
understanding of FEM. Further, as an abstract branch of mathematics, it can often be
daunting for students due to the conceptual difficulty as well as the notation. To minimize
confusion we will put our emphasis on simple cases which enable us to understand the
important concepts (in some cases avoiding more rigorous / technical explanations).
1.1.1 Sets
A set in mathematics is a collection of distinct items, defined with curly brackets {..}.
A set can contain any items: integers, names, a mixture of items, matrices, and even
2
defined
as
z}|{
S := { list of items }.
|{z} | {z }
Name curly brackets
of set
So, for example, if we wished to consider the set of names which start with J and contain
three letters, we could define the list as shown in equation 1.1. Alternatively, if we wanted
to define a set which contains all elements of Fibonacci’s sequence, we could do so as shown
in equation 1.2.
Undoubtedly you will be familiar with some basic sets, for instance the set of integers or
real numbers – denoted N and R respectively. Other examples can be seen in example 1.1.
Example 1.1 The following are examples of various sets of numbers commonly seen in
engineering, physics and mathematics.
These primitive sets form the basis for many more complex sets, for instance Rn⇥m , which
is the set of n⇥m real matrices. Sets provide a useful way of formally defining a collection
of items and examining their properties.
1.1.2 Subsets
Given a set S which has defined items or components, it is often useful to consider subsets
of S (indeed, in the finite element method we always use subsets). In general, we define
a subset to be any set which takes items from S based on some criterion which the item
must satisfy to be in the subset, i.e.
defined
as
z}|{
Sb := { x 2 S} | some set criterion on x}.
| {z
|{z}
Name Any item x
of set in set S
In this formalism, the set Sb is defined with respect to another set S and satisfies the
formal definition given in 1.1. Here the set Sb takes all items x which are elements of the
set S (denoted x 2 S) that satisfy some set criterion. For example,
N := {x 2 R| |x floor(x)| = 0} (1.3)
the set of integers is a subset of the set of real numbers R or the set of primes
P := {a 2 N| (a/b) 2 N i↵ b = 1, a, 8b 2 N} .
defines the set S again as a subset of real numbers. However, in this case we do not look
for items of x 2 S within a specific range, but instead look for items which satisfy the
criterion that sin x = 0. In this case, S := {0, ±⇡, ±2⇡, . . . ± n⇡, . . .} contains all integer
multipliers of ⇡.
As a result, the definition given above guarantees that Sb is, in fact, a subset of S. There
are a number of ways in which to declare that Sb is (or is not) a subset of S, for example:
Set operators enable a collection of sets to be used together to build a new set (see
figure 1.1). Here we will focus on some basic operations which are often seen in FEM
literature. The first we consider is the union (denoted using [), which may be written as
follows. Suppose A and B are two sets, then their union is given as
defined
as
z}|{
A [ B}
| {z := {x : x 2 A or x 2 B} (Union)
| {z }
The union any x such that x is an element of A or an element of B
of A and B
Here the union combines all elements in A and all elements in B, making it true that
A ✓ A [ B and B ✓ A [ B. Alternatively, we may consider the intersection (denoted
using \) of sets A and B, i.e.
defined
as
z}|{
A \ B}
| {z := {x : x 2 A and x 2 B} (Intersection)
| {z }
The intersection any x such that x is an element of A and an element of B
of A and B
where the intersection takes only elements in both A and B, thus A \ B ✓ A and
A \ B ✓ B. The union is often used in finite elements for approximation while the
intersection operator is used to restrict sets to satisfy specific of conditions.
Another important operation is the complement (denoted using \), which can be written
as
defined
as
z}|{
A\B := {x : x 2 A and x 2
/ B} (Complement)
| {z } | {z }
The complement any x such that x is an element of A and is not an element of B
of A given B
That is, the complement of A given B takes all elements of A which are not also elements
of B (i.e. A\B ✓ A and A\B [ B = ;). These set operators, shown illustratively in
figure 1.1 represent three ways in which we may combine sets.
Another important set operator is the direct product (denoted using ⇥). A basic example
of the direct product is R ⇥ R – or R2 – which denotes the set of real two dimensional
vectors. In general,
defined
as
z}|{
A ⇥ B}
| {z := {(x, y) : x 2 A and y 2 B} (Direct Product)
| {z }
The direct product any (x,y) such that x is an element of A and y is an element of B
of A and B
The direct product enables the combination of sets into a new set with items which are
pairings of items from the original sets. This formalism enables the creation of vector
spaces (such as Rn ) as well as the combination of any group of sets.
Figure 1.1. Illustration of set union, intersection and complements for sets V and W .
Example 1.2 The following shows equivalent ways to define the set of integers between
0 and 10.
any
integer x
z }| {
| {z:=} {0,
V
|
1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
{z } | {z } x 2 N || 0 {z
V := { x 10 }
}
The set V The set of numbers 0, 1, 2, . . . 10 The set V where x is greater
is defined as is defined as
than zero and less
than 10
the intersection
any of the set of integers
integer x and range [0, 10]
z }| { z }| {
| {z:=} { x 2 N || x 2{z
V [0, 10] }
}
V
| {z:=} N \ [0, 10]
The set V where x is in The set V
is defined as the range 0 to 10 is defined as
any
integer x
z }| {
| {z:=} {0,
V
|
1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
{z } | {z:=} { x 2 N || 0 {z
V x 10 }
}
The set V The set of numbers 0, 1, 2, . . . 10 The set V where x is greater
is defined as is defined as
than zero and less
than 10
the intersection
any of the set of integers
integer x and range [0, 10]
z }| { z }| {
| {z } x 2 N || x 2{z
V := { [0, 10] }
}
V
| {z:=} N \ [0, 10]
The set V where x is in The set V
is defined as the range 0 to 10 is defined as
In the finite element method we make use of what are referred to as finite-dimensional
sets, i.e. sets with a countable dimension. This concept di↵ers from the size of a set
which refers to the number of items the set contains. For example, the size of a set
S = {x 2 R|0 x 1, } is infinite as the number of items in S cannot be counted. In
contrast, considering S = N \ [0, 1], the size of the S is countable and equal to the number
of integers which fall within the the range [0, 1] (i.e. 2).
While the size of a set reflects the number of items it contains, the set’s dimension
corresponds to the size of the smallest subset which – through a weighted sum – may be
used to represent the set. For example, suppose we consider the following set of vectors,
80 1 0 1 0 1 9
>
< 1 0 0 >
=
B C B C B C
S := @ 0 A , @ 1 A , @ 0 A . (1.6)
>
: >
;
0 0 1
Clearly, we can count the items in S which has size three. Examining the elements of S,
we cannot express any of its components as a weighted sum of others. For example, there
is no a, b 2 R for which
0 1 0 1 0 1
1 0 0
B C B C B C
@ 0 A = a@ 1 A + b@ 0 A. (1.7)
0 0 1
This means that the dimension of S is also three. In this case, we say that the components
of S are linearly independent. In contrast, if we consider the set R3 (that is the set of
three-dimensional vectors), we may express any vector ~v 2 R3 as
0 1 0 1 0 1
1 0 0
B C B C B C
~v = ↵1 @ 0 A + ↵2 @ 1 A + ↵3 @ 0 A , (1.8)
0 0 1
or stated equivalently,
3
X
~v = ~ k,
↵k u ~ k 2 S,
u (1.9)
k=1
where we select each item of S once. This implies that, while the number of items in
R3 is infinite (it is not countable), the dimension of R3 is three. In this context, we call
the components of S a basis for R3 , and say that all other components of R3 are linearly
dependent on the components of S. Note that, while we selected S with unit vectors,
we can, in principle, construct a basis from any set of linearly independent vectors. For
example,
8 0 1 0 1 0 19
>
< 1 0 0 > =
0 B C B C B C
S = @ 1 A, @ 1 A, @ 1 A (1.10)
>
: >
;
0 0 1
is also a set of linearly independent vectors and also forms a basis for R3 . We can see this
by noting that if {↵1 , ↵2 , ↵3 } are the weights of a vector in the basis S, the corresponding
weights { 1 , 2, 3} for S 0 may be related as
1 = ↵1 , 2 = ↵2 ↵1 ↵3 , 3 = ↵3 . (1.11)
Importantly, the same concept with numbers may be extended to functions. Suppose we
think now of S as the set of all linear functions on the real number line. Then we can
define S as
S = {f : f (x) = mx + b, x 2 R, (m, b) 2 R} (1.12)
In this example, the size of the set of functions S is infinite stemming from the infinite
combinations of di↵erent slopes and intercepts, m and b. From the definition it is also
clear that we can define a basis for S with the linearly independent components,
These components are linearly independent as there is no constant a for which 1 = ax for
any x 2 R. We can identify the weights of the basis components (or basis functions) as b
and m (on the first and second components, respectively). As before, this is not the only
basis, and we may construct an alternative basis
B 0 = {1 x, x}, (1.14)
where the weights for the first and second components are now b and m + b, respectively.
The concept of sets with finite dimension (but with an infinite number of items) is a
critical component of the finite element method and enables us to identify an infinite
number of functions by selecting a finite set of weights (as we will see).
In this course we will be dealing with linear spaces – defined in definition 1.2. Linear spaces
satisfy basic conditions we intuitively expect to hold for addition and multiplication by a
scalar. Conditions (1) and (2) ensure that addition is both commutative and associative
so the order of addition operations does not alter there result. Condition (3) ensures that
there is a zero element and is closely tied to (4), which ensures that if x is a member of our
space, then so is x (giving the space some symmetry about zero). Condition (5) ensures
the compatibility of multiplication by scalars, while (6) and (7) ensure that multiplication
is distributive over scalar multiplication and addition of elements in the set. Finally, (8)
guarantees that the multiplicative identity holds.
Definition 1.2 (Linear Space) A linear space is a set V equipped with addition and
scalar multiplication, i.e.
+ : V ⇥ V ! V (addition operator)
· : R ⇥ V ! V (multiplication operator)
While these rules are numerous and quite generally defined, their application is relatively
straightforward to observe. For example, let us consider the space of 3 ⇥ 3 matrices,
V := {M 2 R3⇥3 }. (1.15)
This is a linear space, as it satisfies the conditions of definition 1.2. Similarly, the space
of symmetric matrices,
V := {M 2 R3⇥3 | M = M T } (1.16)
satisfies all rules of addition and multiplication, contains 0 the zero matrix, and if M 2 V
then M 2 V , as it is also symmetric. There are also many examples of spaces which
violate the principles of a linear space. Consider, for instance, the set of invertible
matrices,
1
V := {M 2 R3⇥3 | M 2 R3⇥3 }, (1.17)
This space does not contain the zero matrix 0. It also does not contain M for M 2 V
as det( M ) = ( 1)3 det M = 1.
Finally, and most importantly, the finite element method uses linear function spaces (see
definition 1.3). Just like we can construct the set of triples representing the Euclidean
space R3 (i.e. three-dimensional space), we can also construct groups of functions which
share certain properties. For example, let’s consider the set of scalar real functions which
are finite on the interval [0, 1], i.e.
such
that isf inite
z}|{ z }| {
V := { f : [0, 1] ! R | max |f (x)| < 1 } (1.19)
| {z } x2[0,1]
f takes interval [0,1] | {z }
and returns a real number the maximum
absolute value of
f on [0, 1]
Its clear that functions of V can be added and subtracted, multiplication is distributive,
f 2 V implies f 2 V , etc. While this space contains non-linear functions (for example,
f (x) = ex ), the function space itself is linear (by definition). This is an important property,
which underpins much of the required functional analysis theory.
Definition 1.3 (Linear Function Space) A linear function space is a linear space of
functions.
1.2.1 Norms
In FEM, as with any numerical approximation method, we must always consider the
adequacy of our numerical solutions. Deriving any value from an approximate solution
is only as good as the approximation itself - i.e. how close the approximation is to the
true value. More broadly we could ask how close are items in a linear space or a linear
function space?
In the Euclidean space R3 the concept of gauging the closeness of two points is arrived
at by considering the distance between the two points. We know from basic geometry
~ 2 R3 is naturally defined using the distance
that the distance between two points ~v , w
function,
p
~ =
d(~v , w) (v1 w1 )2 + (v2 w2 )2 + (v3 w3 )2 (1.20)
Here we can see that the distance function d takes two vectors in R3 and translates their
di↵erence into a real number representing the length span between points, i.e.
it takes a vector in R3 a
real
and another in R3 number
z }| { z}|{
d:
|{z} R3 ⇥ R 3 !
|{z} R
the and returns
function d
is defined
such that
~ = |v1
d1 (~v , w) w1 | + |v2 w2 | + |v3 w3 |, (1.21)
1/4
~ = (v1
d4 (~v , w) w1 )4 + (v2 w2 )4 + (v3 w3 )4 . (1.22)
Clearly, d1 and d4 measure distance and are only zero if the two points are the same.
More generally we can define the concept of a norm – which in R3 is simply a measure
of distance – as a function relating how close any two items in a linear space are (see
definition 1.4).
Here, property (1) yields a unique zero element, so that the measure is only zero if the
item being measured has zero length. Property (2) ensures that multiplicative scaling of
the item f scales its measure proportionally. Finally, property (3) is the triangle inequality
which is often used in analysis. In R3 , we can define the norm | · |2 on R3 as,
Properties (1) and (2) clearly hold by the definition of d. To show (3) holds we consider
We can think of many measures on the general n dimensional Euclidean space, e.g.
3
!1/2
X
|~v |2 := vk2 (1.25)
k=1
Then we can be define the L1 [a, b] norm (where we use the notational convention of
pairing the norm – in this case L1 – and the function’s defined range [a, b]),
where sup denotes the supremum – or smallest upper bound – of a set of numbers (in this
case |f (x)|).1 All properties in definition 1.4 hold for the L1 norm, and thus it may be
used to measure the distance between functions (in this case, the largest di↵erence at any
point). We may also consider other norm measures, for example,
✓Z b ◆1/2
2
kf k0 = (f (x)) dx , (L2 [a, b] norm)
a
or more generally,
✓Z b ◆1/p
p
kf k0,p = |f (x)| dx , (Lp [a, b] norm)
a
It is also possible to consider measures which examine both a function and its rate(s) of
change, for example, what we call the H 1 [a, b] norm,
✓Z b ◆1/2
2 0 2
kf k1 = (f (x)) + (f (x)) dx , (H 1 [a, b] norm)
a
We can also consider norms for functions of higher dimensions, vector functions, tensor
functions, etc (though we will restrict our attention to the 1D case). All these measures
provide mechanisms for gauging the size of functions or how two functions might be
related. However, they all do so in a slightly di↵erent manner. To illustrate this, consider
1/4
the function f (x) = 1/x on the interval [0, 1]. The measure of f can be given by:
Z 1
1 4
kf k10,1 = dx =
0 x1/4 3
Z 1 2
1
kf k20 = dx =2
0 x1/4
Z 1 4
1
kf k40,4 = dx =1
0 x1/4
..
.
1
kf k0,1 = supx2[0,1] =1
x1/4
Z 1 2 2
1 1
kf k21 = + dx = 1
0 x1/4 4x5/4
As we can see, the measure of f is finite and goes unbounded depending on the norm
used. This is inherently due to the fact that norms measure di↵erent characteristics
of functions, and we must work with them appropriately. For example, if we wish to
understand the distance between the functions f (x) = x 1/2 and g(x) = x1 /4 , then using
k · k0,1 versus k · k0 yields,
Z 1
1 1 2
kf gk0,1 = dx = ,
0 x1/2 x1/4 3
Z 1 2
1 1
kf gk20 = dx = 1,
0 x1/2 x1/4
which may or may not be useful (depending on the problem). However, with some
knowledge of what they are being used for, these norms provide valuable tools for
understanding the size – or measure – of a function.
In the previous sections we defined the concept of a linear function space as well as a
means for constructing measures of the distance between functions similar to how we
measure distance between points. The pairing of these two results is what we refer to
as a normed linear function spaces. These are particularly useful for analysis, as they
contain measurable items. This enables the calculation of the functions measure as well
as how close it may be to other functions. Later in this chapter we will consider a range of
normed linear function spaces which will be used throughout the rest of this introduction
to FEM.
Definition 1.5 (Normed Linear Function Space) Given a linear function space V
for which one can construct a norm k · kV defined on V , the pairing (V, k · kV ) is called a
normed linear function space.
With the concepts of linear functions spaces and normed linear function spaces we can
start to think about sequences of functions. Most of us are probably used to thinking
about sequences of numbers, i.e.
(xn )n 1 = (x1 , x2 , x3 , . . .), (xn ) 2 R (1.27)
where our xn denote di↵erent numbers (and n is an integer in N). We can think of
examples like the Fibonacci’s sequence,
as well as many other sequences. We can observe many di↵erent traits of a sequence, but
for our purposes we are most concerned with whether a sequence is convergent, i.e. does
the sequence have a limit L 2 R such that
goes to 0
z}|{
|xn L| ! 0 , | !
n {z 1} (1.29)
| {z }
The di↵erence as n tends to 1
in x and L
In this case we consider our sequence to be composed of real numbers and its limit to be
some real number. For example, we know the sequence
✓ ◆
n 1 1 1 1
( n) n 1
= 1, , , , ,... (1.30)
4 27 256 3125
( 1)n n 0
= (1, 1, 1, 1, 1, 1, . . .) (1.31)
which alternates between positive and negative 1 is not convergent. This sequence is,
however, bounded in the interval [ 1, 1]. Another example is the sequence,
✓ ◆
x2n 1 0.01
⇡ (0.495, 0.237, 0.098, 0.002, 2.090, . . .) (1.32)
2xn n 1,x0 =1
which is initially convergent, but diverges if the sequence gets close to zero. Looking over
the first 200 and 1000 terms of the sequence, the absolute maximums of the sequence are
⇠ 9.5 and ⇠ 41, respectively. Setting x0 = 0.1, we get the sequence (0.1, 0, 1, . . . ...)
showing the sequence is unbounded.
How can we extend these concepts to functions and sequences of functions? The
answer is using norms. From the previous section, we recall that a norm is zero
only for the zero element of the space. Hence, if we consider a sequence of functions
(fn )n 1 = (f1 , f2 , f3 . . .) where all functions f are in the normed linear function space
(V, k · kV ), then we can study the convergence / divergence of our sequence of functions
by examining the behavior of the real number sequence,
to see if the sequence of functions (fn )n 1 is convergent to a function g (see definition 1.6).
It is clear that if the sequence of numbers shown in equation 1.33 is convergent to zero,
then the function sequence is convergent to the function g.
kfn gkV ! 0
as n ! 1.
To see this in action, let’s consider the sequence of functions fn : [0, 1] ! R given by,
1/2+1/n
(fn (x) = x )n 1 = (f1 (x) = x1/2 , f2 (x) = 1, f3 (x) = x 1/6
, . . .) (1.34)
1/2
Clearly we can see that fn ! g(x) = x . Considering the sequence (kfn gk0,1 )n 1 ,
1
we can see the sequence (in the L [0, 1]-norm) is convergent to the function g as,
✓ ◆
4 4 2
(kfn gk0,1 )n 1 = , 1, , , . . . !0 (1.35)
3 5 3
In contrast, we can see that (fn )n 1 does not converge in the L2 [0, 1] norm, as the measure
of g is unbounded!
There
exists a
number N
z}|{
8✏ > 0}
| {z 9N such that n, m > N ) kxn xm k ✏.
| {z }
For all ✏ Any n and m greater than N
greater than
zero implies that the distance between xn and xm
is less than ✏
A key concept arising from Cauchy sequences is the converse statement: is a Cauchy
sequence necessarily convergent? This statement is true for any linear (function) spaces
which are complete (see definition 1.8).
Definition 1.8 (Complete Space) A normed linear space V is complete if every Cauchy
sequence (fn ) (with fn 2 V ) is also convergent to a g 2 V .
It can also be shown that the converse is true for any real number sequence in R, but not
for any linear function space (unless it is complete).
Definition 1.9 (Banach Space) A Banach space is a complete normed linear (function)
space.
Banach spaces play a critical role in analysis of solutions to ODE / PDE systems as
well as finite element approximations. The principle value of Banach spaces in FEM is
the restriction to sets which consist of measurable functions. Moreover, they have the
property that a Cauchy sequence of functions is necessarily convergent, implying that a
limit exists. This is an extremely powerful result, as we then know that, for a sequence of
finite element approximations which get progressively closer to each other (i.e. a Cauchy
sequence), that our approximations approach a limit that satisfies our ODE / PDE.
Figure 1.2. Stefan Banach - a Polish mathematician responsible for defining Banach spaces. Frigyes Riesz - a Hungarian
mathematician responsible for defining LP spaces and Riesz representation theorem.
Figure 1.3. Sergei L. Sobolev - a Russian mathematician responsible for defining Sobolev spaces. David Hilbert - a German
mathematician responsible for defining Hilbert spaces.
In contrast, the space of continuous functions is complete with the norm kf k0,1 =
supx2[a,b] |f (x)|, as it can be shown this necessitates that the limit remain continuous
(and therefore the Cauchy sequence is convergent).
A simple examples of Banach spaces include the set of real numbers R or three-dimensional
Euclidean space R3 . However, for our analysis with finite elements, we will use a series of
specialized function spaces.
1.4 Lp spaces
A series of function spaces which are common in FEM analysis are the Lp spaces which
those functions which show boundedness in the Lp norms discussed in section 1.2.1 (see
definition 1.10). Defined by Frigyes Riesz (see figure 1.2, these di↵erent spaces require
di↵erent powers of a function to be integrable over the range of a function (⌦ in the
definition).
Definition 1.10 (Lp (⌦) Space) The space Lp (⌦) is the Banach space of functions
bounded by the norm Lp (⌦), i.e.
Functions in will be encountered frequently, in particular the space L2 (⌦) which denotes
the square integrable real valued functions on ⌦. This is frequently used as the norm of
choice for characterizing how functions – or sequences of FEM approximations – converge.
They are also commonly used as the space where terms which appear in an ODE / PDE
must be bounded.
1.5 C k spaces
The C k spaces are those functions for which the all derivatives up to order k are
continuous. The space given in definition 1.11 forms a complete normed linear function
space populated by continuous functions. This space can be extended, as shown in
definition 1.12, to demand more regularity (i.e. further conditions on smoothness) for
functions in the space. In many cases, these spaces contain the all analytic solutions for
ODE / PDE problems we consider.
Definition 1.11 (C 0 (⌦) Space) The space C 0 (⌦) is the Banach space of continuous
functions on the domain ⌦, i.e. for any f 2 C 0 (⌦) and ✏ > 0 we have a > 0 such that,
|f (x + ) f (x)| < ✏.
Further, we frequently use continuous spaces when deriving rules which govern
convergence behavior for approximations (as we will see in chapter 3).
Definition 1.12 (C k (⌦) Space) The space C k (⌦) is the Banach space of functions on
the domain ⌦ where,
dn f
2 C 0 (⌦), n k.
dxn
equipped with the norm,
k
X dn f (x)
kf kk,1 := sup .
n=0 x2⌦
dxn
which satisfies basic conditions of (1) symmetry and linearity, i.e. (2) distributive
multiplication and (3) distributive addition. In addition, the inner product induces a
norm – as shown in item (4) – on our space V .
(·, ·) : V ⇥ V ! R
However, unlike the norm or distance operators, the inner product need not be positive
or zero (only for the zero element in the space). Indeed, we often refer to functions in V
as orthogonal if their inner product is zero.
Combining L2 with inner product operators which require square integrability of all
derivatives up to k th order defines the space of H k Hilbert spaces (see definition 1.14).
Hilbert spaces are Banach spaces with a norm given by the inner product (as shown in
item (4) of definition 1.13). We can see, by definition, that H k (⌦) ✓ L2 (⌦) with the case
of equality only holding for H 0 (⌦).
Hilbert spaces form a critical basis for nearly all analysis which we review in these notes.
For problems of di↵usion, viscous fluid flow, linear mechanics, etc we often seek solutions
which exist inside these spaces. It is also most often the case that the finite element
approximations – like those which will be developed in this course – are in the first
Hilbert space.
Definition 1.14 (H k (⌦) Space) The space H k (⌦) is the k th Hilbert space defined as,
⇢
dn f
H k (⌦) = f :⌦!R| 2 L2 (⌦), nk
dxn
Xk ✓ n ◆ Z
d u dn v
(u, v)H,k = , , (u, v) := u · vdx
n=0
dxn dxn ⌦
and norm q
kukk = (u, u)H,k
Definition 1.15 (W n,p (⌦) Space) The space W n,p (⌦) is the Sobolev space defined as,
⇢
dm f
W n,p
(⌦) = f :⌦!R| 2 Lp (⌦), mn
dxm
n
!1/p
X dm f
p
kukn,p =
m=0
dxm 0,p
In the previous section, we introduced some basic concepts from Functional analysis
and defined the spaces of functions which we encounter most often in finite elements.
The spaces encountered – for example the Lp or H k spaces – all have infinite dimension
and infinite size. In the finite element method, these infinite dimensional spaces are
approximated simplified spaces which have finite dimension (and infinite size). This
enables the method to approximate a large range of functions, but limit the dimension to
sizes that can be solved on a computer.
In this section, we introduce discrete approximation spaces. The backbone of these spaces
is the construction of a simple space which can represent basic behavior of functions. This
is typically done using polynomial approximations which may be of any order, i.e. linear,
quadratic, cubic, etc. These simple polynomial approximations are then defined over a
finite set of elements, building a discrete (i.e. segmented) approximation space.
2.1.1 Pk
The space Pk is the space of k th order polynomials. Often the region of space over which
the polynomials in Pk are defined is implied (not written explicitly) but is set to be the
master element eM . The selection of the master element is based on considerations which
we will cover in more detail in later sections; however, in 1D the master element is simply
the unit interval eM = [0, 1]. We can define the space of k th order polynomials P k as
By definition, we can see that the space of polynomials is constructed by summing linearly
independent functions, making the dimension of Pk equal to k + 1 (i.e. dim(Pk ) = k + 1).
A basis is a set of linearly independent items which can – through weighted sum – represent
any element in a set. From equation 2.2, we can define the functions
which by construction span the set Pk . The functions co , c1 , and c3 are referred
to as cardinal basis functions and collectively referred to as the cardinal basis (see
definition 2.1). The cardinal basis represents a straightforward basis for polynomial
spaces. It also benefits by being hierarchical, which stems from the fact that the cardinal
basis for Pk contains the cardinal basis Pk 1 .
Definition 2.1 (Cardinal Basis) The cardinal basis of Pk is the set of (k+1) functions
The polynomial space Pk does not have a unique basis, but an infinite number of
alternative basis sets. All of these can be described by a weighted sum of the form
2 3 2 32 3 2 P 3
k n
1 (⇠) a1,1 · · · a1,k+1 1 n=0 a1,n+1 ⇠
6 . 7 6 . .. .. 76 . 7 6 .. 7
6 .. 7 = 6 .. . . 7 6 .. 7 = 6 . 7 (2.3)
4 5 4 54 5 4 5
k
Pk n
k+1 (⇠) ak+1,1 · · · ak+1,k+1 ⇠ n=0 ak+1,n+1 ⇠
where { 1 , . . . k+1 } denotes the new set of basis functions and a a coefficient matrix.
While the cardinal basis is the most basic representation of our polynomial space, it
is often more convenient to use alternative basis sets – such as modal, Chebyshev and
others. For the discrete approximation spaces constructed for finite element problems we
will apply the nodal Lagrange basis.
A specific basis commonly used in finite element schemes is the nodal Lagrange basis (see
definition 2.2). This basis is specifically constructed so that each weight, wi , represents
the value of our interpolated function at specific points – referred to as node points – in
eM . The number of points is equal to the number of basis functions, so for Pk the number
of node points and nodal Lagrange basis functions is (k + 1).
Node points are constructed so that they are evenly distributed over the interval eM =
[0, 1]. Thus, if ⇠m denotes the mth node point, its position is given by
m 1
⇠m =
k
for the k order nodal Lagrange basis. We can see that the node points for the first few
polynomial spaces are:
P1 : {⇠1 , ⇠2 } = {0, 1}
We can see that the node points do not necessarily remain the same with increasing order
(making the nodal Lagrange basis non-hierarchical). The property pairing weights with
these node points requires that the nodal Lagrange basis functions satisfy the property
shown in definition 2.2. That is, each nodal Lagrange basis function n must be one at a
single node point and zero at all others.
This can be demonstrated by considering the polynomial space P2 . In this case, any
function f 2 P2 can be written as the weighted sum,
Definition 2.2 (Nodal Lagrange Basis) The nodal Lagrange basis for the polynomial
space Pk , denoted { n }k+1
n=1 , satisfies
for the node points ⇠m = (m 1)/k. The basis can be represented as the weighted sum
of the cardinal basis, i.e.
2 3 2 32 3
1 (⇠) N1,1 · · · N1,k+1 1
6 .. 7 6 .. ... .. 76 . 7
6 . 7=6 . . 7 6 .. 7
4 5 4 54 5
k
k+1 (⇠) N k+1,1 · · · N k+1,k+1 ⇠
where N k denotes the coefficient matrix computed as the inverse transpose of the
Vandermonde matrix (evaluated at all node points),
Nk = V k T
How can we construct such a basis and do we even know if such a basis exists? In fact,
we can use this property of the basis to define the terms. We also know that the nodal
Lagrange basis can be expressed in terms of the cardinal basis. This means we can write
the nth nodal Lagrange basis function as
k
X
n (⇠) = Nn,j+1 ⇠ j (2.4)
j=0
where Nn,1 , Nn,2 , . . . Nn,k+1 are the unknown weighting terms on the cardinal basis. We
know there are (k + 1) terms and thus to determine these coefficients we need (k + 1)
constraints. These can be generated by using the (k + 1) node points and the fact that
k
X
n (⇠1 ) = Nn,j+1 ⇠1j = 0
j=0
..
.
k
X
n (⇠n ) = Nn,j+1 ⇠nj = 1
j=0
..
.
k
X j
n (⇠k+1 ) = Nn,j+1 ⇠k+1 =0
j=0
This can be written efficiently by constructing the Vandermonde matrix, where each row
is constructed as the cardinal basis {1, ⇠, ⇠ 2 , . . .} for Pk evaluated at a specific node
point. For example, the first three Vandermonde matrices are:
" # " #
1 ⇠0 1 0
V1= =
1 ⇠1 1 1
2 3 2 3
1 ⇠0 ⇠02 1 0 0
6 7 6 7
V 2 = 4 1 ⇠1 ⇠12 5 = 4 1 1/2 1/4 5
1 ⇠2 ⇠22 1 1 1
2 3 2 3
1 ⇠0 ⇠02 ⇠03 1 0 0 0
6 7 6 7
6 1 ⇠1 ⇠12 ⇠13 7 6 1 1/3 1/9 1/27 7
6
V3=6 7 6
=6 7
2 3 7 7
4 1 ⇠2 ⇠2 ⇠2 5 4 1 2/3 4/9 8/27 5
1 ⇠3 ⇠32 ⇠33 1 1 1 1
Noting that this process can be repeated for each nodal Lagrange basis, we may construct a
matrix of nodal weights N k (for polynomial space Pk ) where the nth row gives the cardinal
weights for n. Thus, the nodal Lagrange weights must satisfy the matrix system,
V k N Tk = I (2.5)
where I is the identity matrix. As a result, the nodal Lagrange weights N k = V k T are
the inverse transpose of the Vandermonde matrix (as shown in definition 2.2). The nodal
Lagrange basis for P1 and P2 are shown in figure 2.2.
1 (⇠) 2 (⇠)
1 1
⇠ ⇠
0 1 0 1
1 1
⇠ ⇠
0 0.5 1 0 0.5 1
(a) '1 (⇠) = 2 (⇠ 1) (⇠ 0.5) (b) '2 (⇠) = 4⇠ (1 ⇠)
'3 (⇠)
⇠
0 0.5 1
(c) '3 (⇠) = 2⇠ (⇠ 0.5)
The existence of a nodal Lagrange basis is related to the invertibility of the corresponding
Vandermonde matrix and can be generated for many di↵erent polynomial orders.
Typically the nodal Lagrange basis su↵ers from ill-conditioning at higher orders (typically
k > 5) and other basis functions must be used in these instances.
Equation 2.5 provides a convenient way of computing the weights for any polynomial
space Pk and can be used efficiently in numerical computation. For instance, equation 2.4
provides a straightforward formula for evaluating any basis function. Further, we can
easily take it’s derivative(s), i.e.
k
"m 1 #
dm n (⇠) X Y
= (i j) Nn,i+1 ⇠ (i m)+
(2.6)
d⇠ m i=0 j=0
Q
where is the product operator (see example 2.2) and (x)+ = max(x, 0) is added to
avoid numerically evaluating rational polynomials. This construction and use of the nodal
Lagrange basis will be central to the remainder of this introductory text.
Example 2.2 For the case where m = 3, the product operator gives
Example 2.3 For the nodal Lagrange basis of P2 , the first basis is defined as 1 (⇠) =
2⇠ 2 3⇠ + 1. From this, we see (N1,1 , N1,2 , N1,3 ) = (1, 3, 2). Considering the case
m = 1,
2
" 0 #
d 1 (⇠) X Y
= (i j) N1,i+1 ⇠ (i 1)+
d⇠ i=0 j=0
2.2 Discretization
In the previous section we introduced the polynomial space Pk some of its di↵erent basis
sets. In this section, we focus on constructing piecewise polynomial spaces. This is
achieved by subdividing a domain of interest ⌦ = [a, b] ⇢ R into a finite set of elements
– referred to as a mesh. Over this each element in the mesh we then represent functions
using the polynomial space Pk .
We start by defining our mesh of ⌦ and some basic terms and notation. We then detail
how we can map the polynomial space Pk (defined on the master element eM = [0, 1])
onto each element in our mesh to construct a piecewise polynomial approximation – the
precursor to our discrete approximation space.
Definition 2.3 (Elements) In the context of the finite element method, elements refer
to some subset e ✓ ⌦ which denotes a part of the d dimensional domain ⌦ ⇢ Rd .
Each element is taken to have some portion of the domain ⌦ but we also require that
the elements are non-overlapping (so the intersection ei \ ej is empty unless ei and ej are
neighbors, in which case they share a common point). In general we can split the domain
into any number of elements, Ne .
Example 2.4 Suppose ⌦ = [0, 1] is split into four equally sized elements then
1 1 1 1 3 3
e1 = 0, , e2 = , , e3 = , , e4 = ,1 .
4 4 2 2 4 4
To have a convenient way of referring to elements, we group them into a set Th (⌦) =
{ek }N
k=1 which we call the mesh of ⌦ (see definition 2.4). Here T is an abbreviation for
e
The subscript h denotes the characteristic mesh size which is related to the largest interval
in the 1D mesh. This may be formally written as,
Definition 2.4 (Mesh) In the context of the finite element method, a mesh refers to a set
composed of non-overlapping elements which are used to approximate the d dimensional
domain ⌦ ⇢ Rd . Denoting this set Th (⌦) – with h denoting the representative mesh (or
element) size – we construct Th (⌦) so that,
[
⌦= e
e2Th (⌦)
Example 2.5 The mesh for example 2.4 may be written as Th (⌦) = {e1 , e2 , e3 , e4 }.
The polynomial space Pk represents the basic template for the space of functions on every
element in Th (⌦). From section 2.1, we defined our set of functions on the master element
eM = [0, 1] with the coordinate ⇠. Each basis function was then defined clearly on this
interval, for example the linear nodal Lagrange basis functions are simply
1 (⇠) =1 ⇠, 2 (⇠) = ⇠.
for some ⇠ 2 [0, 1] is the local element coordinate on the master element eM = [0, 1]. To
use this construct for each of our elements, we must somehow define the basis functions
in terms of the coordinates of ⌦ (for which we will use x). If we can define a mapping
function pm = eM ! em which continuously maps eM to em 2 Th (⌦), then we may define
the basis in terms of the global coordinate by (see figure 2.4),
ˆn (x) = n pm1 (x) or ˆn pm (⇠) = n (⇠).
In the first case, we see that the inverse mapping is applied to the point x, giving the
⇠ point which satisfies x = pm (⇠). This ⇠ point may then be naturally used to evaluate
the basis n defined on eM . In the second case (which is equivalent for our mapping pm ),
we also see that we may select a ⇠ and then find the corresponding physical coordinate x
at which the basis takes this value. Both are the same, as we assume our mappings are
bijective and satisfy,
u1
u2
u
⇠
0 1
⇠ = 0.2 ⇠=0 u (x) at ⇠ = 0.2
u1
u2 ⇠=1
x
x
0 x1 x2
x2
x1
⇠
0 1
⇠ = 0.2
Figure 2.4. Illustrating how x and u are related through the normalized element coordinate ⇠. The values of x = p(⇠) and
u(⇠) are obtained from a linear interpolation of the nodal variables and then plotted as u(x) = u p 1 (x). The points at
⇠ = 0.2 are emphasized.
Example 2.6 From example 2.5, the linear mapping function from local ⇠ coordinates
to global x coordinates x = pm (⇠) is given as,
This construction of basis functions maps the nodal Lagrange basis so that ˆ(x) may be
written as a function of the spatial coordinate x within a specific element em 2 Th (⌦).
Outside the element e, each basis function is defined to be zero, i.e.
(
ˆ(x) = 0 x2
/ em
· pm1 (x) x 2 em
This property is referred to as compact support as the basis function is non-zero for on a
specific element.
With this mapping defined, we can evaluate our interpolation on any element. Suppose
we take a random point x 2 em in the mth element of our mesh Th (⌦), then any piecewise
polynomial function, f 2 Pk on em , may be evaluated by mapping the point x to its
corresponding ⇠ point in eM = [0, 1], and evaluating a weighted sum. This can be
expressed mathematically as,
k+1
X
f (x) = wn n pm1 (x) (2.9)
n=1
where wn are the (k + 1)-weights on element em . Utilizing the local support of our basis
functions, the weighted sum may also be expressed globally as,
N
X
f (x) = wn ˆn (x) (2.10)
n=1
where N is the total number of global basis functions. This may also be expressed in
vector form as,
23 2 3
w1 ˆ1 (x)
6 . 7 6 7
f (x) = w · ˆ(x), w=6 . 7 ˆ(x) = 6 ... 7
4 . 5, 4 5 (2.11)
wN ˆ1 (x)
As we can see, evaluating a piecewise polynomial constructed in this way requires us to:
(1) find the element which contains the point x 2 em , (2) find the corresponding local
coordinate ⇠ = pm1 (x), and (3) evaluate all basis functions n at the corresponding ⇠,
weight and sum (see example 2.7).
Example 2.7 To see how we compute equation 2.10, let us continue with example 2.6.
Suppose we want to evaluate f (x = 1/3) where f 2 P1 on each element. From our mesh
definition, we see that x = 1/3 belongs to element e2 . Using the definition p2 1 (⇠) in
example 2.6, we can see that
✓ ◆
x 1/4
p2 p2 1 (x) = hp2 1 (x) + 1/4 = h + 1/4 = x.
h
2
X 2
X
f (x = 1/3) = wn n pm1 (x) = wn n (1/3) = w1 1 (1/3) + w2 2 (1/3)
n=1 n=1
This process is far more straightforward if instead we choose to evaluate our function
given an element and ⇠ point pairing. In this case we need only simply evaluate all basis
functions n at the corresponding ⇠, weight and sum. For this reason, we try to avoid
computations using the global coordinate x unless necessary.
where the temperature data was collected (see figure 2.5a). We then wish to construct
the piecewise polynomial u(x) shown in figure 2.5b.
+ +
+ +
+ +
+ ++ + + + ++ + +
+ + + + + +
++ + ++ +
+ +
+ +
x x
(a) (b)
Figure 2.5. (a) Field measurements plotted against the space parameter x. (b) The x domain is divided into three
subdomains, elements, and linear polynomials are independently fitted to the data in each subdomain.
Clearly in this case our mesh size h = 1/3 as each element is of equal length on the unit
interval. On each element, we will project the P1 space using the two local nodal Lagrange
basis functions,
1 (⇠) =1 ⇠, 2 (⇠) =⇠
Mapping the unit interval into each element we define
On each element, our approximation is a weighted sum of the basis function and the
weights. As we have three elements with independent interpolation, the total number of
Figure 2.6. Discretization of a 1D domain. Functions defined on a master element eM are mapped via parametric element
mappings pk : eM ! ek . Basis functions over each master element are subsequently weighted based on global node indices
(for continuous approximations).
weights (and basis functions) is N = 6 (see figure 2.6). In practice we arrange these into
a weight vector,
2 3
u1
6 . 7
u=6 . 7
4 . 5,
u6
The N global basis functions are shown in figure 2.7 and represent the projection of 1
and 2 into each element. In practice, evaluation of the piecewise approximation requires
performing the weighted sum within specific elements.
Figure 2.7. Illustration of the 6 global basis functions for the piece-wise linear discontinuous approximation.
For convenience, we often construct a connectivity array – or local to global index map –
which allows us to easily extract the appropriate weights of u. That is, we want to know:
for the mth element and the j th local basis function, what is the appropriate weight? This
connectivity array, T , may be defined as,
2 3
1 2
6 7
T =4 3 4 5
5 6
Moreover, we can construct the vector of global basis functions (where we treat each as
having compact support),
2 3
ˆ1
6 7
ˆ(x) = 6 ... 7 , where ˆ↵ = pm1 (x), ↵ = T (m, j).
4 5 j
ˆ6
uh (x) = u · ˆ(x).
The answer comes from the property of our nodal Lagrange interpolant. Considering the
boarder between element 1 and element 2, if we want uh to be continuous, we would like
the value of uh (x = 1/3) to be precisely the same in element 1 as it is in element 2.
However, we also know that uh (x = 1/3) corresponds to ⇠ = 1 in element 1 and ⇠ = 0
in element 2. Examining the weighted sum in each element, we see in element 1 that
1 (⇠ = 1) = 0 and 2 (⇠ = 1) = 1 and in element 2 that 1 (⇠ = 0) = 1 and 2 (⇠ = 1) = 0.
Hence, our condition on continuity is that,
uh (x = 1/3)|e1 uh (x = 1/3)|e2 = u2 2 (⇠ = 1) u3 1 (⇠ = 0) = u2 u3 = 0.
Figure 2.8. Discretization of a 1D domain. Functions defined on a master element eM are mapped via parametric element
mappings pk : eM ! ek . Basis functions over each master element are subsequently weighted based on global node indices
(for continuous approximations).
As a result, to force continuity we must constrain the weights to be shared across element
boundaries (see figure 2.8). In this case, we can see that our weight vector has N = 4
components,
32
u1
6 . 7
u=6 . 7
4 . 5.
u4
2 3
1 2
6 7
T =4 2 3 5
3 4
In this case, our N = 4 global basis functions are shown in figure 2.9. We can see
that the picture looks similar to the global basis functions in the discontinuous case.
The key di↵erence being that we know merge basis functions over interior boundaries.
Constructing the vector of global basis functions is once more similar to the discontinuous
Figure 2.9. Illustration of the 4 global basis functions for the piece-wise linear continuous approximation.
case giving,
2 3
ˆ1
6 7
ˆ(x) = 6 ... 7 , where ˆ↵ = pm1 (x), ↵ = T (m, j), x 2 em .
4 5 j
ˆ4
uh (x) = u · ˆ(x).
vh on element e
is in the space Pk
mapped to e
z }| {
V h = { vh 2 L2 (⌦)| vh |e = Pk (e), 8e 2 Th (⌦)}
| {z } | {z }
Our function vh For all e
is any function is our mesh
in L2 (⌦)suchthat
where all that has changed is we now declare our polynomial must come from the space
of continuous functions (note that in this case V h ⇢ H 1 (⌦).
These concepts and definitions are central to the finite element method, which looks to
simplify more general function spaces into a simpler set which may be represented by a
finite sum. Indeed, as we have a global basis, we know our discrete approximation space
is finite in dimension with an infinite number of items (or functions). We will exploit this
property when looking to find solutions in ODE / PDE problems.
The finite element method takes advantage of well established results from Approximation
Theory - a branch of mathematics concerned with how well a set of functions may be
represented by a smaller, simpler set. This field boasts many important results which
explain how well interpolations can match known functions or functions which exist in a
given function space. For our purposes, we will focus primarily on the results which show
how we can devise an interpolant, P , to a given function, f , which has a controllable level
of error. Put simply, we construct an interpolation function P for which we can ensure
we achieve a certain error.
In the context of finite elements, the simpler set of functions we use is usually some
collection of polynomials and the function being approximated is the solution to some
ordinary or partial di↵erential equation. It is clear that if we find a solution which satisfies
some ODE or PDE, it does not mean this function has any relation to the function P we
concoct in this section. However, as we will see, the very existence of a function P in our
discrete function space is, in many cases, enough to say something about the interpolated
FEM solution.
To illustrate this, we will first consider a specific case of approximating a given function on
a 1D domain. Subsequently, we will present the more general results which are applicable
for di↵erent interpolation orders and higher dimensions.
this, we need to be clear how we will measure error, i.e. what metric we will use. Once
a metric is selected, we may then construct a linear interpolant which gives us a way to
manage our errors, leading to a generic result for interpolation in one dimension.
Figure 3.1. Illustration of a function f (x), its piecewise linear interpolant P (x) and the subsequent error function denoted
✏(x).
In this chapter, we will use the L2 and H 1 norms as introduced in section 1.6. The reason
for this is that, in many FEM problems, these norms naturally arise in the definition of
the problem. Recalling these two norms, we know that,
✓Z b ◆1/2
2 2
(L -norm) kf k0 := [f (x)] dx ,
a
✓Z b ◆1/2
1 2 0 2
(H -norm) kf k1 := [f (x)] + [f (x)] dx ,
a
(3.1)
In these norms, we may then look at the measure of the error e(x) = f (x) P (x) by
examining kek0 and kek1 . Note, if the error e(x) = f (x) P (x) = C between the function
p
and our interpolant was constant, then kek0 = kek1 = C b a. So we see that these
measures give us an idea about the cumulative error over the entire domain.
Each norm tells you di↵erent things about our error. Consider the case where have
f (x) = 1 and take P (x) = 1 + ✏ sin(nx) (with ✏ 2 R and n 2 N being some real number
and integer, respectively). Looking at the error e on the interval [a, b] = [0, ⇡] we see that,
✓Z ⇡ ◆1/2
2
kek0 = [1 1 ✏ sin(nx)] dx
0
✓Z ◆1/2 ⇣ ⇡ ⌘1/2
⇡
2 2
p
= ✏ sin (nx)dx = ✏2 = ✏ ⇡/2
0 2
✓Z ⇡ ◆1/2
2 2
kek1 = [1 1 ✏ sin(nx)] + [ ✏n cos(nx)] dx
0
✓Z ◆1/2 ⇣ ⇡
⇡
2 2 2 2 ⇡ ⌘1/2 p p
= ✏ sin (nx) + (✏n) cos (nx)dx = ✏2 + (✏n)2 = ✏ 1 + n2 ⇡/2
0 2 2
As we see the error in L2 is proportional to the parameter ✏ (i.e. e / ✏) and the domain
size ⇡. If we make ✏ smaller and smaller, the amplitude of the sine wave shrinks, bringing
the functions closer to one another. Note, that the error in L2 has no dependence on the
frequency n of P !
In contrast, the H 1 error depends both on the amplitude parameter ✏ as well as the
frequency n. Here we can see that if we send the frequency n ! 0, then the error depends
entirely on the amplitude parameter ✏. However, if we make the amplitude larger and
larger, then we need to shrink the amplitude to maintain a similar error.
This di↵erence between the measures results from the fact they measure two di↵erent
things. The L2 norm measures how close the two functions are to one another (in an
integral sense). H 1 norm measures how close the two functions and their derivatives are
to one another (again, in an integral sense). Both measures give valuable input about
the closeness of our interpolant and the original function. Here we will consider both
measures.
With a clear set of norms in mind for measuring error, we now consider developing a
linear interpolant to approximate our function f . A natural way to approximate f is to
evaluate f at a series of points and then interpolate linearly between them (see figure 3.1).
Let’s do this.
First we define a set of points to evaluate our function at. Given the interval [a, b], we
may define a set of points {x1 , x2 , . . . xN } which we distribute uniformly over the interval.
In this case,
i 1
xi = (b a) + a, i = 1, 2, . . . N.
N 1
⇡ 2⇡
Example 3.1 If [a, b] = [0, ⇡] and N = 4, then our set of points is {0, 3
, 3, ⇡}.
With a clear set of points defined, it is now possible to evaluate f (xi ) for all our points,
and connect linearly between them. This can be done with the function,
✓ ◆ ✓ ◆
xn+1 x x xn
P (x) = f (xn ) + f (xn+1 ), x 2 [xn , xn+1 ] (3.2)
hn hn
b a
h = xn+1 xn = . (3.3)
N 1
Hence, the more points we add (i.e. the bigger we make N ), the smaller we make our
discretization parameter h. Indeed, we often refer to [xn , xn+1 ] as our nth element and
h as our mesh size (or element size).
The interpolant matches our function at the chosen points, and will linearly vary between
these points.
Notice that, even in this simple case, we must make an assumption about our function.
Namely, we assume that it is well-defined point-wise throughout our domain, i.e. f 2
L1 [a, b]. This is necessary to ensure that our approximation does not shoot o↵ to infinity
or become undefined due to an unlucky choice of points.
With our selection of norms defined and our linear interpolant constructed, we may now
examine how much error there is, i.e. how large is e(x) = f (x) P (x)? The amount of
error can be written as follows.
Theorem 3.1 Linear Approximation Let f 2 C 2 [a, b] and P be the piecewise linear
interpolant defined in section 3.1.2. Then the error between f and P satisfies the
estimates,
kf P k0 Ch2 kf 00 k0 ,
kf P k1 Chkf 00 k0 .
Theorem 3.1 tells us that the error in the L2 norm is proportional to the mesh size
squared multiplied by the measure of our function f ’s second derivatives. This means, if
we double our number of points and approximately half our mesh size h, we should at
least expect our error to decrease by a factor of (1/2)2 . Further, if we are approximating
a function which scaled by a factor 2, its second derivatives are also scaled, and the upper
bound on our error should also increase. A similar story is seen in the H 1 norm, however,
we no longer have proportionality to h2 but instead to h (for reasons we will see later).
A caveat to this result is that Theorem 3.1 gives an upper bound on our error. What this
means, is that our interpolant may be much better at approximating f than the theorem
suggests (see figure 3.2) and may converge with h at a slower than expected rate or faster
rate. Theorem 3.1 only guarantees that asymptotically, as h ! 0, the error must at least
scale according to the bounds (a worst case scenario). Examining figure 3.2, theorem 3.1
ensures that the error must remain in the plane below the red line.
Understanding this result requires the application of basic calculus and analysis. Before
we walk through the argument to show the approximation estimate, let’s briefly review
the strategy. In this scenario, we use the Fundamental Theorem of Calculus (FTC) to
establish a bound between the error and its derivative (using the fact that our interpolation
function is exact at all node points). Via a similar approach (which requires Mean Value
Theorem to establish a point where derivatives are zero), we derive a bound for the norm
derivatives of our error based on the second derivatives of our function f .
Figure 3.2. Illustrative representation of the bounds shown in theorem 3.1. The red line represents the theoretical upper
bound from the estimate. Both curves A and B show optimal rates of convergence, with A starting sub-optimal convergence
h until h is sufficiently small that refinement reaches the asymptotic limit. In the case of B, the estimate predicts the rate
of convergence perfectly, showing optimal convergence over all refinements (with a higher initial error). Curve C illustrates
sub-optimal case which shows that error breaks through the upper bound with refinement. This can only happen if the
assumptions of theorem 3.1 are violated.
Let’s begin by considering any point x 2 In = [xn , xn+1 ] on the nth interval. We know
that our error function satisfies the following conditions on the interval, namely that,
Looking at the FTC, we can re-write the value of our error as shown in step 1 of
equation 3.6 (where we note that e(xn ) = 0). Next we use the fact that integral of a
function is no larger than the integral of its absolute value and certainly no larger than
the integral over the entire interval (step 2 of 3.6). Applying Cauchy-Schwarz inequality
on the functions 1 and |e0 (u)| (step 3) and noting that integral of 1 is simply the mesh
size h we arrive at step 4. Finally, we note that what remains is a norm on the derivative
of e on the interval In .
Z x
e(x) = e0 (u)du
xn
Z x Z xn+1
0
|e (u)|du |e0 (u)|du
xn xn
✓Z ◆1/2 ✓Z ◆1/2
2 0 2
1 du |e (u)| du
In In
✓Z ◆1/2
h1/2
n |e0 (u)|2 du
In
Using this result, we can say that for any x 2 [a, b], we can identify the interval In it is
found in and bound the error as,
2
e2 (x) h1/2 ke0 k0,In = hke0 k20,In (3.7)
X1 Z
N
kek20 = e2 (u)du
k=1 Ik
N
X1 Z
hke0 k20,Ik du
k=1 Ik
h2 ke0 k20
This is not quite the result we are looking for. To complete the derivation, we follow a
very similar procedure, instead looking at the derivative of our error e0 . The key trick we
employed in the first step was to use the FTC to integrate from a point which starts at zero
error and integrate to any other point on the interval. When considering the derivative of
our error, it is no longer the case that our end points are exact (i.e. e0 (xn ) 6= 0 necessarily).
Thankfully, by the Mean Value Theorem, there is a point c 2 (xn , xn+1 ),
e(xn+1 ) e(xn )
e0 (c) = = 0, (3.9)
h
where we note that e(xk ) = 0 at all node points by the design of our interpolant. This
result enables us to now integrate from c to any other point, and follow the same procedure
as shown in equation 3.6 (i.e. by replacing e with e0 and integrating from [c, x]). As a
result, we may say that,
ke0 k0 hkf 00 k0 . (3.10)
where we have noted that P 00 (x) = 0. Hence, combining equation 3.8 and 3.10, we see
that,
kek0 = kf P k0 h2 kf 00 k0 (3.11)
To see the result on the H 1 norm, we use the fact that kek21 = kek20 + ke0 k20 together with
the bounds in equation 3.10 and 3.11, i.e.
Notice that we have now a factor of h2 + 1 which can be no larger than the largest h, i.e.
h2 + 1 (b a)2 + 1. Taking the square root of both sides,
In this section, we examine how this story changes if we use a quadratic interpolation
function, Q(x). For this, we first review how to construct such an interpolation and then
examine how this changes the convergence of our error.
As with the linear interpolation developed in section 3.1.2, in this section, we will construct
a quadratic interpolation which matches our function at a discrete set of points Nq . Lets
start by constructing a series of points as we did in the linear case. Given the interval
[a, b], we again define a set of N points {x1 , x2 , . . . xN } which we distribute uniformly over
the interval. In this case,
i 1
xi = (b a) + a, i = 1, 2, . . . N.
N 1
We know from section 2.1.3, that the dimension of a quadratic function in 1D is 3. Thus
we add to each element (constructed by the interval [xn , xn+1 ]) an additional point – at
the center of the interval – so that, over each interval, we build a quadratic interpolation.
In this case, we create an additional N 1 points where,
1
xi+1/2 = (xi+1 + xi ), i = 1, 2, . . . N 1.
2
Thus our total number of points Nq = 2N 1 and can be written as the set
{x1 , x1+1/2 , x2 , x2+1/2 , . . . xN 1/2 , xN }. With this set in mind, we may now construct an
interpolation which is equivalent to f at all points and is quadratic over the interval, i.e.
Q(x) = f (xn )(1 ⇠)(1 2⇠) + f (xn+1/2 )4⇠(1 ⇠) + f (xn+1 )⇠(2⇠ 1), (3.14)
x xn
⇠(x) = , x 2 [xn , xn+1 ]
h
where here, once again, h is the distance between our points, i.e.
b a
h = xn+1 xn =. (3.15)
N 1
As we can see, the interpolant defined in equation 3.14 satisfies that,
the interpolation matches our function at the chosen points, and will vary quadratically
between these points.
Here we constructed our interpolation Q using the same number of elements as in the
linear case. This means that Nq is nearly double the size of N . We may also select a set
of points which is closer to N , but in so doing, we nearly double the element size.
With our quadratic interpolant constructed, we may now examine the magnitude of our
error e(x) = f (x) Q(x). In this case, the error bounds can be written as follows.
Theorem 3.2 tells us that the error in the L2 norm is now proportional to the mesh
size cubed multiplied by the measure of f 000 . This means, if we half our mesh size h, we
should at least expect our error to decrease by a factor of h3 or 1/8. Further, if we are
approximating a function which is twice as large (size being measured in the L2 norm),
and its third derivatives double, then the error should also increase. A similar story is
seen in the H 1 norm, however, we no longer have proportionality to h3 but instead h2 .
The main di↵erence between our linear and quadratic interpolations is the derivatives on
f and power of h. Why is this the case? The argument for Theorem 3.2 is very similar
to that for Theorem 3.1. As the bounds in equation 3.8 and 3.10 were derived under
consistent conditions to what we have in the quadratic case, we know that1 ,
kek0 hke0 k0 , ke0 k0 hke00 k0 . (3.17)
The question then becomes whether we can say something not about e00 (note that
e000 = f 000 ). If we can find again, a point c 2 In in each interval for which e000 (c) = 0
is necessarily zero, then the same logic employed to show boundedness on e and e0
may be employed. Recall, that by the mean value theorem, we know that there is a
c1 2 (xn , xn+1/2 ) where (again, due to the error e(xk ) = 0 for all node points xk ),
e(xn+1/2 ) e(xn )
e0 (c1 ) = = 0.
xn+1/2 xn
e(xn+1 ) e(xn+1/2 )
e0 (c2 ) = = 0.
xn+1 xn+1/2
1 It is no longer true, in the quadratic case, that Q00 = 0, thus we must keep the bound in terms of e00 .
e0 (c2 ) e0 (c1 )
e00 (c) = = 0.
c2 c1
Thanks to the additional point included at the midpoint of each element, we ensure that
there is a point for which e00 (c) = 0 and, as a result, may write that for any x 2 [xn , xn+1 ],
Z x
00
e (x) = f 000 (x)dx h1/2 kf 000 kL2 (In ) , (3.18)
c
Equation 3.17 together with 3.19, allows us to derive the result of theorem 3.2.
Theorem 3.3 p Order Approximation Let f 2 C p+1 [a, b] and P be the piecewise
p order interpolant constructed using the value of f at a series of equally spaced points.
Then the error between f and P satisfies the estimates,
kf P k0 Chp+1 kf p+1 k0 ,
kf P k1 Chp kf p+1 k0 .
The result of Theorem 3.3 shows us that there are two primary ways in which we can
control the error in our interpolation. The first way is to refine our element size h, dividing
the interval into smaller pieces. If we do so, we will achieve a decrease in our error (at the
very least in the asymptotic limit, see figure 3.2) proportional to hp+1 in the L2 norm.
Example 3.2 Consider the function f (x) = sin(nx) (where n 2 N is some integer).
Supposing p + 1 is even, then f p+1 (x) = ±np+1 sin(nx). Looking at the norm over the
interval [a, b] = [0, ⇡],
✓Z ⇡ ◆1/2 ✓Z ⇡ ◆1/2
p+1 p+1 2 2p+2 2
kf k0 = [f (x)] dx = n sin (nx)dx
0 0
✓ ◆1/2 r
p+1 ⇡ sin(2⇡n) p+1 ⇡
=n =n
2 4n 2
hence, p refinement will only guarantee convergence (by this bound) if h < 1/n, i.e.
when h is sufficiently small.
Figure 3.3. Examination of p refinement for f (x) = sin(4⇡x) illustrating poor p order convergence due to high h. (a)
shows solutions for di↵erent orders and (b) the respective convergence where the red line denotes optimal theoretical
convergence. (c) Illustrates the same problem after reducing the mesh size h, which then exhibits optimal p order
convergence.
dimensions will be the subject of section 8, but for now we present the more general result
(see Quarteroni and Vali for more details).
kf P k km Chl+1 m
kf l+1 k0 (3.20)
for 1 k l and m = 0, 1.
Theorem 3.4 is much more general and applies to the results presented in the previous
sections. Note that the results it gives are also consistent with those derived previously.
More generally, it suggests that approximations we generate in 1D can also be extended
into problems in multiple dimensions!
A key step toward understanding the finite element method is the development of what
is referred to as a weak form equation – an integral equation which is often derived
from an ODE or PDE of interest. In general, the weak form is only the solution to the
original problem when suitable regularity – or smoothness – exists in the original solution.
However, for many practical problems, this holds, making it a powerful tool for solving
ODE / PDE systems.
In this section, we motivate the weak form by considering a minimization problem. More
specifically, we look to find the best approximation to a function f : [a, b] ! R. In
considering the solution to this minimization problem, we naturally arrive at a weak form
equation which must be solved. Working through the problem, we then see how this weak
form can be translated into a linear algebraic system which may be easily solved. These
specific steps form the foundation for the more general finite element framework discussed
in the subsequent chapters.
(see figure 4.1)? Here, we will consider the approximation with the minimal error as
measured by the L2 norm.
(a) Best approximation with h = 1/4 and p = 1 (in (b) Best approximation with h = 1/4 and p = 1 (in
L2 norm). H 1 norm).
Figure 4.1. Comparison of the best approximations using L2 and H 1 minimization functionals F (vh ) = kvh f k20 and
F (vh ) = kvh f k21 , respectively.
Then, if we let V h be our discrete function space denoting the set of all allowable
approximations we wish to consider, then we want to find an approximation uh 2 V h
where,
kuh f k20 min kvh f k20 . (4.1)
vh 2V h
In words, we seek a function uh 2 V h which has an L2 norm error as good or better than
any other function in V h (it is the best). For convenience, we may define the objective
functional F : V h ! R (represented graphically in figure 4.2),
1
F (vh ) = kvh f k20 , (4.2)
2
and re-write equation 4.1 as,
F (uh ) min F (vh ). (4.3)
vh 2V h
This re-write shown in equation 4.3 is done purely for convenience in later sections. We
now have a well defined minimization problem, looking for the approximation uh which
best matches our function f . Obviously, when f 2 V h is actually an element of the set,
then the minima is zero and uh = f . However, for a general function f , this will not be
the case and instead we look for an approximation which is best (in the L2 norm). The
question becomes, how do we find this function? This is the principal endeavor of the
remainder of this chapter.
Figure 4.2. Visualization of the L2 functional over the approximation space V h . Minimum point is denoted by F (uh ).
Example 4.1 Suppose we consider f (x) = sin(⇡/x) and want to find its critical points.
Then the critical points are those points {x1 , . . . xm } at which f 0 (xk ) = 0, k = 1, . . . m.
In this case, ⇣⇡ ⌘
⇡
f 0 (x) = cos
x2 x
In example 4.1, we see that the critical points of f (x) = sin(⇡/x) are infinite and denote
both minima and maxima. This is because the set of critical points contains all extrema.
Moreover, we can imagine the case where a function has both local minima and maxima
x2
as well as (for example f (x) = e sin(⇡/x)) or an inflection point (for example f (x) = x3
at x = 0). We must then take care to ensure that our critical point is in fact a minima.
For our minimization problem in equation 4.3, we know that our minima is in the set of
critical points of the functional F . However, unlike example 4.1, our functional operates
on functions as opposed to R. To find the minima, we must recall from calculus the
concept of a directional derivative (see definition 4.1). Just as the derivative looks at the
rate of change of a function in the direction of its coordinate, the directional derivative
looks at the rate of change of a functional in the direction of a specific function, i.e.
✓Z b ⇣ ⌘2 Z b⇣ ⌘2 ◆
1
= lim uh (x) + ✏vh (x) f (x) dx uh (x) f (x) dx
✏!0 2✏ a a
Z b
1
= lim u2h (x) 2✏vh (x)f (x) + 2✏vh (x)uh (x) 2uh (x)f (x) + ✏2 vh2 (x) +
✏!0 2✏ a
Z !
b
+f 2 (x)dx u2h (x) 2uh (x)f (x) + f 2 (x)dx
a
Z !
b
1
= lim 2✏vh (x)f (x) + 2✏vh (x)uh (x) + ✏2 vh2 (x)dx
✏!0 2✏ a
Z b⇣ ⌘
✏
= lim uh (x) f (x) + vh (x) · vh (x)dx
✏!0 a 2
Z b⇣ ⌘
= uh (x) f (x) · vh (x)dx
a
The critical points in example 4.1 illustrate the general identification of critical points
in one dimension. However, in two or three dimensions, we instead consider when the
gradient of a function is zero (i.e. the partial derivative in each coordinate direction is
zero). Similarly, the critical functions of F are those functions for which the directional
derivative is zero in the direction of every function vh 2 V h (see definition 4.2).
DF (u)[v] = 0, 8v 2 V.
Identifying the critical functions of F provides the set of possible minima of F – giving
functions which yield a local minima or maxima or inflections. For the purposes of our
minimization, we want to find a global minima, providing the lowest L2 norm error. As
Equation 4.4 represents a weak form equation. In this case, this is the weak form equation
to the strong form: find uh 2 V h such that,
The strong form represents some equation which should hold point-wise over the entirety
of our space, while the weak form requires the equation hold in a weak sense – that is, it
is zero when multiplied by a test function and integrated over the domain. In some sense,
the strong form is what we are after: we want the best approximation to f ! However, it
/ V h.
is clear that equation 4.5 does not even have a solution when f 2
What does our strong form equation being zero in a weak sense actually imply? Using
the analogue of vectors, we could think of V h as a space of vectors. Denoting the error
between our approximation and original function as e(x) = uh (x) f (x), and substituting
into equation 4.4, we can observe that the best approximation uh is that which makes the
best approximation to the error zero. To see this, we note that, from equation 4.4,
Z b
e(x) · vh (x)dx = 0, 8vh 2 V h . (4.6)
a
Then, if we want the best approximation to our error, we can examine the minimization,
We know the solution is then a critical function of the functional Fe (eh ) = 12 keh ek20 and
thus satisfies,
Z b⇣ ⌘
eh (x) e(x) · vh (x)dx = 0, 8vh 2 V h .
a
But we know from equation 4.6 that we may reduce the equation to,
Z b
eh (x) · vh (x)dx = 0, 8vh 2 V h . (4.7)
a
However, as equation 4.7 must hold for every vh 2 V h , we can certainly require the
equation hold when vh = eh (since eh 2 V h ). Plugging this into equation 4.7, we see
that our function must satisfy keh k20 = 0. As we know, by the properties of norms (see
section 1.2.1), this necessarily requires that our best approximation eh to the error e be
in fact eh (x) = 0.
This means that our approximation uh to f in equation 4.4 ensures our error is minimized
in a way where the remaining error unobservable in V h (equation 4.6) and its best
approximation is zero.
Existence of solution to equation 4.4 is given by the Riesz Representation theorem. This
lemma relies on V h being an appropriate Hilbert space and the function f being sufficiently
R
smooth (so that the integral ⌦ f (x) · vh (x)dx < 1). In this case, it states that there is
exists a representation of f in V h (i.e. we can find a uh ).
Showing that our weak problem has only one unique solution is now also possible. Here
we will prove this by contradiction. Suppose there are two di↵erent solutions – uh (x) and
wh (x) – that exist and satisfy equation 4.4. Since uh (x) and wh (x) are di↵erent, we expect
that kuh wh k0 > 0 as if the norm is zero, then it implies uh (x) = wh (x). Examining
each weak form we see,
Z
(uh (x) f (x)) · vh (x)dx = 0, 8vh 2 V h
⌦
Z
(wh (x) f (x)) · vh (x)dx = 0, 8vh 2 V h
⌦
Z
(uh (x) wh (x)) · (uh (x) wh (x))dx = kuh wh k20 = 0, (4.8)
⌦
which contradicts our initial premise that kuh wh k0 > 0. Thus, there is only a single
solution which satisfies our minimization problem.
Z b
(uh (x) f (x)) · vh (x)dx = 0, 8vh 2 V h (4.9)
a
We know (from the previous section) that our weak form problem has a unique solution;
however, how equation 4.9 may be solved may not be intuitive. Equation 4.9 does not fit
the mold of a typical calculus problem and requires that an infinite number of equations
must be true due to the fact that there are an infinite number of constituents in V h .
However, by using the finite dimensionality of V h , we can slowly transform our weak
equation into a system which is more familiar – namely, a linear algebra system.
Let’s begin by defining the function space V h as shown in equation 4.10. This states
that our space is constructed by taking the weighted sum of N linearly independent
functions { ˆ1 (x), ˆ2 (x) . . . , ˆN (x)}, which could be piecewise linear polynomials as shown
in figure 4.1 (for example).
n X o
Vh = uh 2 L2 [0, 1] uh (x) = N ˆ for some {c1 , . . . cN } 2 R
k=1 ck k (x), (4.10)
With this as the definition of our set of functions, we can see that any function in our
space is written as the sum,
X
vh (x) = N
k=1 ck ˆk (x) (4.11)
or equivalently as the dot product between vectors which represent our coefficients and
Z b⇣ ⌘ Z b⇣ ⌘ X
uh (x) f (x) · vh (x)dx = uh (x) f (x) · N ˆ
k=1 ck k (x)dx
a a
Z b⇣ ⌘ ⇣ ⌘
= uh (x) f (x) · c0 ˆ0 (x) + · · · + cN ˆN (x) dx
a
N
X Z b⇣ ⌘
= ck uh (x) f (x) · ˆk (x)dx
k=0 a
N
X
= ck R(uh ; ˆk )
k=0
Z b⇣ ⌘
R(uh ; vh ) = uh (x) f (x) · vh (x)dx. (4.13)
a
Z b⇣ ⌘ Z b⇣ ⌘
uh (x) f (x) vh (x)dx = uh (x) f (x) (c · ˆ(x))dx
a a
Z b⇣ ⌘
=c· uh (x) f (x) · ˆ(x)dx
a
= c · R(uh )
From this result, its clear that any of the infinite number of constituents of V h may
be expressed by some vector c 2 RN . This means that our weak form statement in
equation 4.9 is equivalent to the vector condition,
That is, the weak statement is the same as requiring the residual vector function dotted
with any N vector of real numbers be zero. Note that R(uh ) does not depend on vh and
is simply evaluated based on the solution and the basis functions of V h . As such, we can
say that R(uh ) is just a vector. Thinking about equation ??, we see the second condition
is for a vector which, when dotted with any other vector, gives zero. We know that the
only vector for which this is true is in fact the zero vector! So our weak form is actually
equivalent to the vector equation,
2 R ⇣ ⌘ 3 2 3
b
u (x) f (x) · ˆ1 (x)dx 0
6 a h 7 6 7
6 .. 7 6 .. 7
R(uh ) = 0, or equivalently 6 . ⌘ 7=4 . 5 (4.16)
4 R ⇣ 5
b
a
uh (x) f (x) · ˆN (x)dx 0
This gives N integral equations which must be true (much more manageable than
1 integral equations). To solve the linear system of equations for uh , we must now
use the fact that our solution may also be expressed as a weighted sum of basis functions,
i.e.
X
uh (x) = N ˆ =u· ˆ
k=1 uk k (x) (4.17)
For convenience, let us re-write our residual vector function into two parts – S and F –
which depend on uh and f , respectively. In this case,
with
2 R 3 2 R 3
b b
a
uh (x) · ˆ1 (x)dx f (x) · ˆ1 (x)dx
a
6 .. 7 6 .. 7
S(uh ) = 6
4 R . 7,
5 F =6
4 R . 7.
5 (4.19)
b b
a
uh (x) · ˆN (x)dx f (x) · ˆN (x)dx
a
S(u · ˆ) = u1 S( ˆ1 ) + u2 S( ˆ2 ) + · · · + uN S( ˆN ) = F . (4.21)
Equation 4.22 states that our solution uh = u · ˆ to the original weak form problem
written in equation 4.9 must have coefficient weights u which satisfy the linear algebraic
system in equation 4.22. Hence solving the weak form problem is equivalent to computing
the matrix A and vector F from equation 4.19 ,
2 3
A11 · · · A1N
6 . .. .. 7
A=6
4 .
. . . 7 ˆ ˆ
5 = [S( 1 ) · · · S( N )],
AN 1 · · · AN N
2 R Rb 3
b
ˆ (x) · ˆ1 (x)dx · · · ˆ (x) · ˆN (x)dx
a 1 a 1
6 .. ... .. 7
=6
4 R . . 7,
5
b ˆ Rb
a N
(x) · 1 (x)dx · · · a N (x) · ˆN (x)dx
ˆ ˆ
Z b
Aij = ˆi (x) · ˆj (x)dx,
a
2 R 3
b ˆ1 (x)dx
f (x) ·
6 a .. 7
F =6
4 R . 7,
5
b ˆ
f (x) · N (x)dx
a
Z b
Fi = f (x) · ˆi (x)dx, (4.23)
a
and solving the linear algebra system (i.e. finding the inverse of A),
u = A 1F . (4.24)
This process and result is at the heart of the finite element method, which generally takes
a weak form equation, approximates the function spaces with finite dimensional variants
(i.e. making V h ), and breaks the system down into an algebraic system of equations.
using the piecewise linear approximation. Our first step is to break ⌦ into a finite set
of elements over which we will construct the approximation space V h . We then move on
to the element-wise computation of both our matrix A and RHS vector F . The linear
algebra system may then be solved to find the best piecewise linear approximation to
f (x) = sin ⇡x.
Constructing a mesh of ⌦
1 1 1 1 3 3
e1 = 0, , e2 = , , e3 = , , e4 = ,1 .
4 4 2 2 4 4
Clearly we can see that ⌦ = {e1 , e2 , e3 , e4 }. The group of elements we refer to as a mesh
of ⌦ and denote by Th (⌦) (see definition 2.4).
Th (⌦) = {e1 , e2 , e3 , e4 }
The subscript h refers to the characteristic size of the mesh, which we can formally define
in 1D as,
that is, h represents the maximum distance spanned over any element in the mesh (in our
case h = 1/4).
With our mesh defined, we may move to constructing the approximation space V h . On
every element in Th (⌦) we can represent any linear function using the basis functions
introduced in section 2.3,
1 (⇠) =1 ⇠, 2 (⇠) =⇠
where ⇠ 2 [0, 1] is the local element coordinate on the master element eM = [0, 1] (see
definition ??). As each element has the same size, we can easily define the mapping
pk : eM ! ek between the master element and any of the elements ek 2 Th (⌦), i.e.
With this mapping defined, we can evaluate our interpolation on any element. Suppose
we take a random point x 2 ek in the k th element of our mesh Th (⌦), then our piecewise
linear function may be evaluated by mapping the point x to its corresponding ⇠ point in
eM = [0, 1], and evaluating a weighted sum. This can be expressed mathematically as,
where c1 , c2 2 R are our unknown constants and pk 1 is the inverse of pk such that,
pk pk 1 (x) = x, x 2 ek . (4.27)
In this case, we wish to approximate f using a continuous piecewise linear function. This
means that the interior points in our mesh should maintain continuity. That is, if we look
at the point x = 1/4, we know this corresponds to c1 2 (1) in element e1 and to c2 1 (0)
xk = h(k 1),
uk = uh (xk ).
If we now think about evaluating a point on the 3rd element of our mesh, this requires
doing a weighted sum with our weights being the third and fourth elements of the u vector.
While this is straightforward in one dimension, it becomes less intuitive in higher spatial
dimensions. For this reason it is convenient to construct an array which maps an element
number and its local basis index to the corresponding coefficient (the connectivity array)
as shown in equation 4.28. Here, if we want to know the global index for the coefficient
scaling the 1st basis function of the 2nd element, we see that it is T (2, 1) = 2.
2 3
1 2
6 7
6 2 3 7
6
T =6 7 (4.28)
3 4 7
4 5
4 5
Using the local to global index map, we can define our piecewise linear and continuous
function on ⌦ = [0, 1] as,
2
X
uh (x) = um ˆm (x), m = T (k, n), x 2 ek , ek 2 Th (⌦) (4.29)
n=1
where
ˆm (x) = n pk 1 (x), and m = T (k, n)
denotes our global basis function over the k th element (as we saw in the previous section).
Our space of polynomials may then be constructed by considering any weight vector
v 2 R5 , i.e.
( 2
)
X
h
V = vh : vh (x) = vm ˆm (x), m = T (k, n), x 2 ek 2 Th (⌦), for any v 2 R 5
n=1
Example 4.2 Suppose we consider the global basis function ˆ3 which scales the weight
u3 . We note that ˆ3 has compact support on elements e2 and e3 (as those are the only
elements which contain 3 in the T array in equation 4.28). As a result,
4 Z
X Z Z
F3 = ˆ3 (x) · f (x)dx = ˆ3 (x) · f (x)dx + ˆ3 (x) · f (x)dx.
k=1 ek e2 e3
This is possible due to the integral form of our equations, where both the entries within
the matrix Aij and vector Fi can be written as a sum of element-level integrals, i.e.
Z 1 4 Z
X
Aij = ˆi (x) · ˆj (x)dx = ˆi (x) · ˆj (x)dx
0 k=1 ek
Z 1 4 Z
X
Fi = ˆi (x) · f (x)dx = ˆi (x) · f (x)dx
0 k=1 ek
Each global basis function ˆi has what is called compact support – that is, it is only
non-zero on a select subset of elements. In our case, the global basis functions at either
end of our 1D domain occur only on one element, while all other global bases occur over
two. That means that we need only consider one element integral for {1, 5} and two for
{2, 3, 4} (see example 4.2).
We also know that on each element, the global basis is paired to a local basis mapped
from the master element . Supposing that ˆi is the 1st local basis on element k, then we
can do a substitution of variables to dramatically simplify our calculations, i.e.
Z Z xk+1
ˆi (x) · f (x)dx = 1 pk 1 (x) · f (x)dx
ek xk
dpk
Substitution x = pk (⇠), dx = d⇠, xk = pk (0), xk+1 = pk (1)
d⇠
Z 1
dpk
= 1 pk 1 (pk (⇠)) · f (pk (⇠)) d⇠
0 d⇠
dpk
note pk 1 (pk (⇠)) = ⇠, =h
d⇠
Z 1
= 1 (⇠)f (pk (⇠))hd⇠ (4.30)
0
Example 4.3 Using the case introduced in example 4.2, we can see that global basis
ˆ3 is the mapped version of local basis 2 on element e2 and local basis 1 on element e3 .
Hence, by equation 4.30,
Z 1 Z 1
1 1
F3 = 2 (⇠) · f (p2 (⇠))d⇠ + 1 (⇠) · f (p3 (⇠))d⇠.
4 0 4 0
This simplicity means that to construct F following the steps listed in algorithm 4.1.
That is we: (1) we zero the entries of F , (2) we loop over every element e 2 Th (⌦) in the
mesh, (3) on each element, we compute the components for each local basis and form an
element vector F e , (4) we add the entries of the element vector F e to the corresponding
terms of F .
e
Algorithm 4.1 (F -Assembly, i.e. F = A F e )
Set F = 0
For each e = Th (⌦)
Compute F e
Add F e ! F
End
The first two steps are straightforward. What remains to be clarified is how to construct
the element vector and add it to the appropriate places in F . In our case, the element
Z " #
1
ek 1 (⇠)f (pk (⇠))
F = hd⇠. (4.31)
0 2 (⇠)f (pk (⇠))
Note that the size of F ek will always be related to the number of basis functions present
on the element (in our case 2). We may then add F ek to the global vector using the local
to global array T , i.e.
2 " # 3 2 3 2 3 2 3
0 0 0
7 6 " # 7 6
e1
6 F 7 6 7
6 7 6 7 6 0 # 7 6 0 7
6 7 6 F e2 7 6 " 7 6 7
F =6 0 7+6 7+6 7+6 0 7 = Ae F e
6 7 6 7 6 F e3 7 6 " # 7
6 7 6 7 6 7 6 7
4 0 5 4 0 5 4 5 4 e4 5
F
0 0 0
| {z } | {z } | {z } | {z }
step1 step2 step3 step4
adding the appropriate element vectors – step-by-step through the loop over elements –
to the global vector F . In our example,
Z " # Z " #
1 1
ek 1 (⇠) sin(⇡pk (⇠)) 1 (⇠) sin(⇡[h⇠ + xk ])
F = hd⇠ = hd⇠
0 2 (⇠) sin(⇡pk (⇠)) 0 2 (⇠) sin(⇡[h⇠ + xk ])
" # " #
1 cos(⇡xk ) 1 sin(⇡xk ) sin(⇡xk+1 )
= + 2 (4.33)
⇡ cos(⇡xk+1 ) h⇡ sin(⇡xk+1 ) sin(⇡xk )
Example 4.4 In our example, the four element vectors (see equation 4.33) can be written,
" # " p #
1 1 4 2/2
F e1 = p + 2 p ,
⇡ 2/2 ⇡ 2/2
" p # " p #
e2 1 2/2 4 2/2 1
F = + 2 p ,
⇡ 0 ⇡ 1 2/2
" # " p #
e3 1 0 4 1 2/2
F = p + 2 p ,
⇡ 2/2 ⇡ 2/2 1
" p # " p #
e4 1 2/2 4 2/2
F = + 2 p ,
⇡ 1 ⇡ 2/2
With a clear picture in mind of how to construct F at the element level and compose the
full vector, we must construct the matrix A. Similar to the computation of F , due to
compact support each matrix entry is comprised of – at most – contributions from two
elements. We can also re-write the element contribution for each by transforming it to
the master element, i.e if we consider i = T (k, 1) and j = T (k, 2) then,
Z Z xk+1
ˆi (x) · ˆj (x)dx = 1 pk 1 (x) · 2 pk 1 (x)dx
ek xk
dpk
Substitution x = pk (⇠), dx = d⇠, xk = pk (0), xk+1 = pk (1)
d⇠
Z 1
dpk
= 1 pk 1 (pk (⇠)) · 2 pk 1 (pk (⇠)) d⇠
0 d⇠
dpk
note pk 1 (pk (⇠)) = ⇠, =h
d⇠
Z 1
= 1 (⇠) · 2 (⇠)hd⇠
0
Z " #
1
ek 1 (⇠) · 1 (⇠), 1 (⇠) · 2 (⇠)
A = hd⇠. (4.34)
0 2 (⇠) · 1 (⇠), 2 (⇠) · 2 (⇠)
The assembly algorithm for A (shown in algorithm 4.2) also follows the approach taken
in the construction of the vector F . The only di↵erence is how the element entries of Ae
are added to the global matrix. The entries in A are again related via the local to global
mapping provided by the T array, giving the formula,
e
Algorithm 4.2 (A-Assembly, i.e. A = A Ae )
Set A = 0
For each e = Th (⌦)
Compute Ae
Add Ae ! A
End
The above assembly process can be also written as the matrix summation (though
2 " # 3 2 3
0 0 0 0 ... 0
6 A e1
7 6 "0 0# 7
6 0 0 0 7 6 0 0 0 7
6 .. 7 6 Ae 2 7
6 7 6 0 0 0 7
A=6 0 0 . 7+6 7
6 .. 7 6 .. .. 7
6 . 7 6 . 0 0 . 7
4 0 0 5 4 5
0 0 ... ... 0 0 0 0 ... 0
2 3 2 3
0 ... 0 0 0 0 ... ... 0 0
6 . .. 7 6 .. 7
6 .. . 7 6 . 0 0 7
6
6 "0 0# 7 6
7 6 ..
7
7
+6 0 0 0 7 +6 . 0 0 7
6 Ae3 7 6 " # 7
6 0 0 0 5 6
7 7
4 4 0 0 0 e4 5
A
0 ... 0 0 0 0 0 0
e
= A Ae (4.36)
Example 4.5 In our example, the four element matrices (see equation 4.34) are
equivalent, i.e.
Ae 1 = Ae 2 = Ae 3 = Ae 4 = Ae
In this case,
Z " #
1
1 (⇠) · 1 (⇠), 1 (⇠) · 2 (⇠)
Ae = hd⇠
0 2 (⇠) · 1 (⇠), 2 (⇠) · 2 (⇠)
Z " # " #
1
(1 ⇠)2 , ⇠(1 ⇠) h 2 1
= hd⇠ =
0 ⇠(1 ⇠), ⇠2 6 1 2
2 3
2 1 0 0 0
6 7
6 1 4 1 0 0 7
h6 7
A= 6 0 1 4 1 0 7
66
6
7
7
4 0 0 1 4 1 5
0 0 0 1 2
With both the matrix and RHS vector computed, the resulting vector of coefficients u are
given by equation 4.24 where we must compute the inverse of our matrix (in brackets).
2 3 2 3 1 2 p 3
u1 2 1 0 0 0 ⇡ 2 2
6 7 6 7 6 p 7
6 u2 7 6 1 4 1 0 0 7 6 4( 2 1) 7
6 7 6 7 6 p 7
6 u3 7 = 24 6 0 1 4 1 0 7 6 4(2 2) 7 (4.37)
6 7 ⇡2 6 7 6 p 7
6 7 6 7 6 7
4 u4 5 4 0 0 1 4 1 5 4 4( 2 1) 5
p
u5 0 0 0 1 2 ⇡ 2 2
The resulting approximate solution is shown in figure 4.1. It can be seen that, unlike
our nodal approximation built in chapter sect:approx, our solution does not match the
function f at the element end points (or nodes), but satisfies the minimization problem.
or that the error between uh and f is below that of any other function in V h (in the
L2 norm) – including our node-based approximation derived in chapter 3. Hence,
Using the approximation bounds derived in theorem 3.1, we can see that our minimum
has a rate of convergence / h2 and has error which satisfies,
To improve the approximation accuracy (i.e. reduce the minimum), we can increase the
number of elements, reducing h. Alternatively, we consider the minimum over a higher
order approximation space, such as the quadratic space discussed in section 3.2.
In this section, we expand this idea to more generic equations and, in so doing, outline
the Galerkin finite element method. This numerical technique is the foundation for many
engineering analysis software tools and simulators, allowing complex ODE / PDE systems
to be solved approximately on computers.
We will begin by laying out the overall Galerkin finite element approach, detailing the
principal steps to the method. Our discussion will then turn to issues around solvability
(existence and uniqueness of solutions) outlining some key requirements for weak form
solutions to linear problems. With these concepts in hand, we then consider the FEM
application to generic conservation laws – looking in detail at an example using the 1D
advection-di↵usion equation.
d
R(u(x), x) + Q(u(x), x) = f (x), for any x 2 ⌦ (5.1)
dx
where R and Q are di↵erential operators (linearly dependent on u) and f is any right hand
side function. This equation we refer to as our strong form equation (see definition 5.1)
which – for our current purposes – will represent an ODE system (see example 5.1).
Definition 5.1 (Strong form) A Strong form equation is any ordinary or partial
di↵erential equation or system of equations which may be written as: find u 2 S which
satisfies,
d
Q(u(x), x) = f (x), x 2 ⌦
R(u(x), x) +
dx
where R, Q are the ODE / PDE operators, f a function over ⌦, u the state variable,
S the function space which the solution (or potential solution) exists in, and ⌦ is the
domain over which the problem is sought.
Like we learned in chapter 1, it is useful to characterize the space in which functions live
in order to dive deeper into their potential properties. In the context of dealing with
an ODE / PDE, we consider equation 5.1 and look to find solutions u which makes the
equation true. For this, we restrict consideration to functions for which we can say that
both sides of the equation remain bounded (i.e. are not infinite). If we want f to be
point-wise bounded on ⌦, we can see that f 2 L1 (⌦) must hold. Further we will assume
that,
max |R(u(x), x)| + max |Q0 (u(x), x)| Ckukp,1 (5.2)
x2⌦ x2⌦
both operators can be bounded by a Sobolev norm of order (see section ??) – which
is needed to ensure point-wise boundedness. As a result, if our function u is in a space
S = W p,1 (⌦) – i.e. is bounded in this norm, then equation 5.1 remains finite on ⌦ (which
is our minimum requirement). Refer to example 5.1 shows an illustrative case.
Now suppose that we cannot solve equation 5.1 directly through analytic techniques and
instead wish to consider approximating the solution u by uw – the solution to the weak
form (see definition 5.2). We begin by multiplying equation 5.1 by any suitable test
function v 2 W (we will worry about the space later) and integrating over ⌦,i.e.
Z b Z b
R(uw (x), x) + Q0 (uw (x), x) · v(x)dx = f (x) · v(x)dx, 8v 2 W (5.3)
a a
Letting e(x) = R(uw (x), x) + Q0 (uw (x), x) f (x) be the error in our solution (i.e. the
point-wise error between the weak form and strong form), we can see that equation 5.3
ensures that the best approximation to e in W is zero.
Then in the format of equation 5.1, the operators R, Q and f are written as,
Clearly f 2 L1 [0, 1] as the ex sin(ex ) is everywhere bounded on the interval [0, 1].
Examining R, we can see that
and for Q,
max |Q0 (u(x), x)| max |u00 (x)| kuk2,1
x2⌦ x2⌦
Thus, we observe that the solution u 2 W 2,1 [0, 1]. That is, u is a twice di↵erentiable
function taking the interval [a, b] to the set of real numbers.
Often integration by parts is applied to move the di↵erential on Q to the test function v.
This reduces the requirements on the continuity and also enables application of conditions
(for initial value problems which are discussed in more detail later). Applying integration
Finally, assuming for some known A, B we can write Q(uw (x), x)v(x)|bx=a = Bv(b) +
Av(a), we write our final weak form as: find uw 2 W which satisfies,
Z b Z b
0
R(uw (x), x)·v(x) Q(uw (x), x)·v (x)dx = Bv(b)+Av(a)+ f (x)·v(x)dx, 8v 2 W.
a a
(5.5)
Definition 5.2 (Weak form) Given a Strong form, as shown in definition 5.1, a Weak
form equation is an integral equation written as: find u 2 W which satisfies,
Z Z
0
R(u(x), x)·v(x) Q(u(x), x)·v (x)dx = Q(u(x), x)v(x)|bx=a + f (x)·v(x)dx, 8v 2 W,
⌦ ⌦
where u is the weak form solution, W the function space which the weak form solution
exists and is the space of test functions. We then say that the strong form holds weakly
in W .
There are a few important di↵erences between the strong form 5.1 and weak form 5.5.
First, the strong form is point-wise and usually requires a strong space S in which solutions
may be found (see definition 5.1 and example 5.1). The weak form in equation 5.5
allows for solutions with a lower degree of di↵erentiability and hence is in a weaker space.
Examining the terms, we can see that the source f (x) provides a well-defined term in
equation 5.5 when f 2 L2 (⌦) (if v is at least in L2 (⌦)). Further, the bound of our
operators are typically,
that is, we require the smoothness to be p 1 and the boundedness in the L2 norm. We
also know that – if a strong form solution exists – it satisfies our weak form (this happens
by construction). However, a weak form solution only satisfies the strong form when it
is sufficiently smooth (i.e. it satisfies the strong form conditions). In 1D, the continuity
depends principally on the smoothness of f (but the story is more complex in multiple
dimensions). An example is shown in equation 5.2.
Example 5.2 Considering the weak form of the problem in example 5.1, applying the
definitions of R, Q, f to equation 5.5, we see the weak form is: find uw 2 W which satisfies,
Z 1 Z 1
uw (x) · v(x) u0w (x) · v 0 (x)dx = Bv(b) + Av(a) + sin(ex ) · v(x)dx, 8v 2 W
0 0
Here we observe that uw and u0w should be bounded in L2 [0, 1], making W = H 1 [0, 1].
We note that uw is the solution to what is often referred to as the continuous weak
form (equation 5.5). In the following section, we introduce a remarkable result which
ensures that such a formulation has – under some reasonable restrictions – a unique
solution. Subsequently, we look at approximating the solution to the weak form by
selecting W h ⇢ W constructed using piecewise polynomials.
Z
l(v) := Bv(b) + Av(a) + f (x) · v(x)dx (5.8)
⌦
Then we can re-write our weak form problem as: find u 2 W such that,
Here we see that a and l are both operators which take, as inputs, elements of the function
space W and return, as outputs, real numbers, i.e.
a : W ⇥ W ! R, l : W ! R.
Now we restrict ourselves to those forms of a which are what we refer to as continuous
and coercive. A continuous operator on W ⇥ W is one in which the value of the operator
is bounded by its input parameters, i.e.
where k · kW is the norm on our real Hilbert space W (see example 5.3). A coercive
operator is one which is bounded below, i.e.
Using these conditions, we can recall the Lax-Milgram lemma - a remarkable result which
underpins much of our understanding of finite element theory.
Lemma 5.1 (Lax-Milgram lemma) Suppose a is a bilinear operator on the real Hilbert
space W and satisfies conditions 5.10 and 5.11 and l : W ! R is linear on W , then there
exists a unique uw 2 W which satisfies equation 5.9.
Example 5.3 To understand the continuity condition in equation 5.10, lets consider the
analogous situation with vectors / matrices. If we let our operator
a(w, v) = wT M v
for the vectors w, v 2 R3 and a given matrix M 2 R3⇥3 , then we know it is linear and
satisfies,
a(w, v) max |w||v|
where max is the maximum eigenvalue of M . The fact that it is also bounded means
that if we define as convergent sequence for example w + ✏y, w + ✏2 y, w + ✏3 y, . . . and label
the vectors in my sequence w1 , w2 . . ., then
where min is the minimum eigenvalue of M the coercivity condition then ensures that
M is positive definite.
The Lax-Milgram lemma provides a clear construct for ensuring solvability of a weak form
system. In some cases it is straightforward to illustrate continuity and coercivity, which
provides sufficient conditions for the existence of unique solutions to our weak system of
equations. Notice as well that the lemma holds for any real Hilbert space – including a
complete subspace (which is also a Hilbert space)!
In this case, due to the construction of our polynomial space W h it is complete and thus
itself a Hilbert space – which means the solution to our weak form exists if the conditions
of Lax-Milgram lemma hold.
Using the discrete function space W h ⇢ W , we can instead consider the weak problem:
find uh 2 W h so that,1
Z b Z b
R(uh , x) · vh Q(uh , x) · vh0 dx = Bvh (b) + Avh (a) + f · vh dx, 8vh 2 W h . (5.12)
a a
where we use an approximation on both the test function (our set of v’s) and trial functions
(to approximate u). We will see in the remainder of this chapter – similar to section 4.4
– that equation 5.12 reduces to a matrix system,
Au = F ,
which may be solved for the approximation coefficients. While in the minimization case
(chapter 4) we were directly trying to equate our approximation to a known function
(where our operator was a norm), in this case we consider the more general idea
of approximating the solution by approximating the weak form equation (which our
function should satisfy). By subtracting equation 5.12 from equation 5.5 and choosing
v = vh 2 W h , we observe that,
Z b Z b
R(uw , x) · vh Q(uw , x) · vh0 dx = Bvh (b) + Avh (a) + f · vh dx,
a a
Z b Z b
R(uh , x) · vh Q(uh , x) · vh0 dx = Bvh (b) + Avh (a) + f · vh dx
a a
Z b
= R(uw uh , x) · vh Q(uw uh , x) · vh0 dx = 0. (5.13)
a
While for the minimization problem our error was directly e(x) = uw (x) uh (x) and we
saw the weak form ensured the best approximation was that which created an error best
approximated by eh = 0. In this case, from equation 5.13 this story is much less clear.
Thankfully, we can, again, use theory to verify that equation 5.12 can, in fact, provide a
meaningful way to develop approximations uh of uw .
For ODEs – and PDEs in higher spatial dimensions – there are many di↵erent boundary
conditions which may be applied. In this section, we review three of the most common:
(1) Dirichlet conditions, (2) Neumann-type conditions, and (3) Mixed conditions.
Dirichlet conditions are conditions which are to be set directly on the variable. In this
case, we know that at both end points of our 1D domain ⌦ = [a, b] the values that u has.
where ua , ub 2 R are some values given boundary conditions in the problem. This
boundary condition is straightforward to apply in the finite element method, as we may
augment our solution uw with the known boundary conditions. That is, we want to find
a uw 2 WD where,
WD = {v 2 W | v(a) = ua , v(a) = ub }
satisfies our boundary conditions. Notice that WD is no longer a Hilbert space. We can
see this by noting that, for any w, v 2 WD then w v 2 WD should hold in a complete
space (which is not true). However, we can easily reconsider the problem by choosing any
ud 2 WD , and then defining uw = ud + v⇤, where v⇤ 2 W0 is in the so-called homogenous
space,
W0 = {v 2 W | v(a) = 0, v(b) = 0}
Clearly, W0 remains a Hilbert space and thus we can solve the problem: find v⇤ 2 W0
such that,
Z b Z b
0
R(ud + v⇤, x) · v Q(ud + v⇤, x) · v (x)dx = f · vdx, 8v 2 W0 . (5.15)
a a
Since ud we can select to be anything, we can choose ud and move it to the RHS as an
additional source term in our equations. We are then looking for v⇤ which we may add
to ud to get our weak solution. In practice this is what we do to analyze and solve the
Dirichlet problem. However, for ease, we often augment this detail and simply state the
weak form and discrete weak form Dirichlet problems as:
8
< Continuous Dirichlet Weak Form: Find uw 2 WD such that,
Z b Z b
(5.16)
: 0
R(uw , x) · v Q(uw , x) · v (x)dx = f · vdx, 8v 2 W0 .
a a
8
< Discrete Dirichlet Weak Form: Find uh 2 WDh such that,
Z b Z b
(5.17)
: 0
R(uh , x) · vh Q(uh , x) · vh (x)dx = f · vh dx, 8v 2 W0 .
a a
In the discrete weak form, we usually select those interpolation functions which are non-
zero at the end points of ⌦ to be exactly the required condition (by replacing the matrix
row with a condition specifically on the coefficient). This will be discussed further in later
sections.
Neumann-type conditions are flux conditions which dictate the flux of a quantity at the
boundaries of the domain ⌦. That is, the quantity Q(u(x), x) is given at each end point.
In this case, the BVP is written as:
8
>
> Neumann-type Strong Form: Find u 2 S such that,
>
>
>
> d
>
< R(u(x), x) + Q(u(x), x) = f (x), for any x 2 ⌦
dx
(5.18)
>
> Q(u(x), x)|x=a = Qa
>
>
>
>
>
: Q(u(x), x)|x=b = Qb
Imposing the Neumann-type conditions on the weak form equation comes naturally into
the formulation, as integration by parts on Q resulted in the term Q(u(x), x)v(x)|bx=a
appearing in the weak form (see definition 5.2). In this case, we can replace Q(u(x), x)
with the values required in the BVP, making the continuous weak form equation:
8
>
> Continuous Neumann-type Weak Form: Find uw 2 W such that,
>
< Z b Z b
0
R(uw , x) · v Q(uw , x) · v (x)dx = Qb v(b) + Qb v(a) + f · vdx. (5.19)
>
>
>
:
a a
8v 2 W,
Notice, because we do not constrain uw directly as in the Dirichlet case, we do not need
to restrict the function space W . Moreover, the discrete weak form equation also follows
directly, giving:
8
>
> Discrete Neumann-type Weak Form: Find uh 2 W h such that,
>
< Z b Z b
R(uh , x) · vh Q(uh , x) · vh0 (x)dx = Qb vh (b) + Qb vh (a) + f · vh dx. (5.20)
>
>
>
:
a a
8v 2 W h ,
h
Mixed conditions represent problems which include a Dirichlet at one end of the domain
and a Neumann-type condition at the other end of the domain. For example, a mixed
BVP could look like:
8
>
> Mixed Strong Form: Find u 2 S such that,
>
>
>
> d
>
< R(u(x), x) + Q(u(x), x) = f (x), for any x 2 ⌦
dx
(5.21)
>
> u(x = a) = ua
>
>
>
>
>
: Q(u(x), x)|x=b = Qb
WD = {v 2 W | v(a) = ua }, W0 = {v 2 W | v(a) = 0}
To incorporate the Neumann-type condition, we fsimply replace Q(u(x), x)|x=b with the
given condition Qb . As a result, the continuous weak form equation becomes:
8
< Continuous Mixed Weak Form: Find u 2 WD such that,
Z b Z b
(5.22)
: R(uw , x) · v Q(uw , x) · v 0 (x)dx = Qb v(b) + f · vdx, 8v 2 W0
a a
Similarly, in the discrete setting, we modify the approximation space W h to either have
nodal values at node x = a satisfy the Dirichlet condition or zero, i.e.
8
< Discrete Mixed Weak Form: Find u 2 WDh such that,
Z b Z b
(5.23)
: R(uh , x) · vh Q(uh , x) · vh0 (x)dx = Qb vh (b) + f · vdx, 8vh 2 W0h
a a
N
X
uh = uk ˆk (x) = u · ˆ(x),
k=1
N
X
vh = ck ˆk (x) = c · ˆ(x),
k=1
where
23 2 3 2 3
c1 u1 ˆ1 (x)
6 . 7 6 . 7 6 .. 7
c=6 . 7
4 . 5, u=6 . 7
4 . 5,
ˆ(x) = 6
4 . 7
5
cN uN ˆN (x)
we can observe that the discrete weak form equation is equivalent to the vector equation,
a(uh , c · ˆ) = l(c · ˆ), 8c 2 RN , ) S(uh ) = F (5.25)
with
2 3 2 3
a(uh , ˆ1 ) l( ˆ1 )
6 .. 7 6 . 7
S(uh ) = 6
4 . 7,
5 F =6 . 7
4 . 5. (5.26)
ˆ
a(uh , N ) l( ˆN )
This equivalence stems for the linear dependence of both a and l on the test function vh .
This linear dependence means,
a(uh , c · ˆ) = a(uh , c1 ˆ1 + · · · + cN ˆN )
= a(uh , c1 ˆ1 ) + · · · + a(uh , cN ˆN )
= c1 a(uh , ˆ1 ) + · · · + cN a(uh , ˆN )
= c · S(uh ) (5.27)
and
l(c · ˆ) = l(c1 ˆ1 + · · · + cN ˆN )
= l(c1 ˆ1 ) + · · · + l(cN ˆN )
= c1 l( ˆ1 ) + · · · + cN l( ˆN )
=c·F (5.28)
which that the discrete weak form equation is equivalent to the vector equation shown
in equation 5.25. To convert the vector equation into a linear algebraic system, we need
to exploit the linear dependence of a on uh . This translates to our vector function S,
making it also linearly dependent on uh . As a result,
where
2 3
a( ˆ1 , ˆ1 ) · · · a( ˆN , ˆ1 )
6 .. .. .. 7
A = [S( ˆ1 ), . . . S( ˆN )] = 6
4 . . . 7.
5 (5.29)
a( ˆ1 , ˆN ) · · · a( ˆN , ˆN )
Finally, we arrive at a linear algebraic system in a similar way to that derived in the
previous section, i.e.
Au = F . (5.30)
Before solving, we must also consider the boundary conditions present on our BVP.
Suppose, for example, we have the Dirichlet condition uh (0) = uo at one end of the
domain. In this case, we know the first node of our approximation, due to our nodal
Lagrange basis functions, is the value of our solution, i.e. u1 = uo . To make sure this
holds, we replace the equation in A which corresponds with the test function ˆ1 with the
constraint u1 = uo . This is integrated into our matrix system by updating A and F by,
2 3 2 3
1 0 ··· 0 uo
6 7 6 7
6 a( ˆ1 , ˆ2 ) a( ˆ2 , ˆ2 ) · · · a( ˆN , ˆ2 ) 7 6 l( ˆ2 ) 7
A=6
6 .. .. ... .. 7,
7 F =6
6 .. 7
7 (5.31)
4 . . . 5 4 . 5
a( 1 , N ) a( 2 , N ) · · · a( N , ˆN )
ˆ ˆ ˆ ˆ ˆ ˆ
l( N )
Alternatively, if a Neumann type condition is imposed on our system at one end of the
domain, so Q(uh (0), 0) = Qo , we observe that the added term in our operator l is Qo vh (0).
Since 0 is the end node of our domain ˆ1 (0) = 1 and all other basis functions ˆk (0) = 0
(k > 1). This means, that the incorporation of the Neumann-type condition simply adds
a term into the first row of the vector F .
With all boundary conditions imposed on the linear algebraic system, the unknown
coefficients u of our solution are found by invert the matrix A giving,
u = A 1F .
5.6.1 Example A
In this example, we consider the strong form equation given in example 5.1. Here we
follow the steps introduced to construct the approximate solution to the problem. In this
example we aim to approximate the problem using linear piecewise polynomials on four
elements.
of the domain ⌦ = [0, ln 2⇡]. Comparing the ODE to the form introduced in equation 5.1,
we observe that the R, Q and f operators are:
8
>
> Strong Form: Find u 2 W 2,1 (⌦) such that,
>
>
>
>
>
< u00 (x) + e2x u(x) = ex sin(ex ), for any x 2 ⌦ = [0, ln 2⇡]
(5.32)
>
> u(x = 0) = cos(1)
>
>
>
>
>
: u(x = ln 2⇡) = 1
Note here that W 2,1 (⌦) is the Sobolev space of functions which are twice di↵erentiable
and bounded point-wise (meaning our ODE equation makes sense at any point in the
domain).
From equation 5.32 we must apply Dirichlet conditions to both ends of ⌦. As described
in section 5.4.1, by imposing Dirichlet conditions on both ends of our domain, we will
automatically select test functions v which are zero at both x = 0 and x = ln 2⇡. The
result is that we may simplify our weak form equation by noting that the boundary term
must be zero.
Z Z
u0w · v 0 + e2x uw · v dx = ex sin(ex ) · v dx. (5.35)
⌦ ⌦
To identify the space of functions from which to choose uw and v we first note that we
want the equation (both right and left hand sides) to be finite for any choice of uw and
v. This is ensured if we choose the function space W = H 1 (⌦), which guarantees that uw
and v and their derivatives u0w and v 0 are square integrable. Lastly, we will restrict our
solution uw to be those functions which satisfy the Dirichlet conditions, i.e.
and choose the test functions v which are zero at both ends of the domain ⌦,
8
< Weak Form equation: Find uw 2 WD such that,
Z Z
0 0 2x (5.36)
: uw · v + e uw · vdx = ex sin(ex ) · vdx, 8v 2 W0
⌦ ⌦
k 1
ek = [xk , xk+1 ], xk = (ln 2⇡), (5.37)
4
making our mesh,
Th (⌦) = {e1 , . . . e4 }. (5.38)
As described in section 2.3.2, our approximation is constructed based on two local nodal
Lagrange basis functions 1 (⇠) = 1 ⇠ and 2 (⇠) = ⇠ defined on the master element
eM = [0, 1]. In order to construct our global basis functions, we first define the mapping
functions
pk : e M ! e k , pk (⇠) = h⇠ + xk (5.39)
which allow us to project the local basis functions onto each element of our mesh.
Assuming discontinuity in the approximation, we would end up with 8 global nodal
Lagrange basis functions (2 for each element). However, enforcing continuity, we must
require the basis functions at interior node points to be continuous (i.e. share coefficients)
dropping our number of global basis functions to 5. In this case, the connectivity array
(which declares, given an element and local node number, the corresponding global node
number) is given by,
2 3
1 2
6 7
6 2 3 7
T =6 6 3 4 7
7 (5.40)
4 5
4 5
Finally, we can say that our discrete solution uh = u · ˆ is a weighted sum of our global
basis functions, where
2 3 2 3
u1 ˆ1
6 7 6 ˆ 7
6 u2 7 6 2 7
6 7 6 7
u=6
6 u 3
7,
7
ˆ = 6 ˆ3 7 ,
6 7
6 7 6 ˆ 7
4 u4 5 4 4 5
u5 ˆ5
and similarly for vh = v · ˆ. Our resulting approximation space W h may then be written
as n o
W h = vh : vh (x) = v · ˆ(x), x 2 ek 2 Th (⌦), for any v 2 R5 .
That is, it is the space of all functions which may be expressed as a weighted sum of our
global basis functions. Finally, we can re-write the continuous weak form equation by
considering the problem on the subset W h ⇢ W .
8
< Discrete Weak Form equation: Find uh 2 WDh such that,
Z Z
(5.41)
: u0h · vh0 + e2x uh · vh dx = ex sin(ex ) · vh dx, 8vh 2 W0h
⌦ ⌦
That is, a is the operator containing those terms of the discrete weak form equation that
contain both uh and vh while l contains the remaining terms which depend only on vh .
In our case,
Z
a(uh , vh ) = u0h · vh0 + e2x uh · vh dx (5.42)
⌦
Z
l(vh ) = ex sin(ex ) · vh dx (5.43)
⌦
Z
Fi = l( ˆi ) = ex sin(ex ) · ˆi dx. (5.45)
⌦
Following the element assembly process introduced in section 4.5.2, we can reduce
computation of the global matrix A into an assembly process over the element-level matrix
Ae and element-level right hand side vector F e , e↵ectively simplifying our computations.
In this case, the terms of our element-level matrix (a 2 ⇥ 2 matrix) and element-level right
hand side vector (a 2 ⇥ 1 vector) can be written for any element e 2 Th (⌦) as,
Z
(A )ij = ae ( ˆn , ˆm ) =
e ˆ0 · ˆ0 + e2x ˆn · ˆm dx
n m (5.46)
e
Z
(F )i = le ( ˆm ) =
e
ex sin(ex ) · ˆm dx (5.47)
e
where m = T (e, i) and n = T (e, j) denote the global basis indices corresponding to the
local basis indices (i, j) (where i and j are either 1 or 2). Lastly, each computation can
be re-written on the master element eM , expressed in terms of the local basis functions
1 and 2, and approximated using a Gaussian quadrature scheme (with Gauss points ⇠g
Z
d ˆj (x) d ˆi (x)
e
(A )ij = · + e2x ˆi (x) · ˆj (x)dx
e dx dx
d d d⇠
Chain rule (·) = (·)
dx d⇠ dx
Z ✓ ◆2
d ˆj (x) d ˆi (x) d⇠
= · + e2x ˆi (x) · ˆj (x)dx
e d⇠ d⇠ dx
✓ ◆ 1
dpk d⇠ dpk 1
Substitution x = pk (⇠), dx = d⇠ = hd⇠ and noting = =
d⇠ dx d⇠ h
Z
d j (⇠) d i (⇠) 1
= · + he2pe (⇠) i (⇠) · j (⇠)d⇠
eM d⇠ d⇠ h
X ✓ d j (⇠g ) d i (⇠g ) 1 ◆
2pe (⇠g )
⇡ wg · + he i (⇠g ) · j (⇠g ) .
g
d⇠ d⇠ h
we may replace the first and last rows with the algebraic constraints,
2 32 3 2 3
1 0 0 0 0 u1 cos(1)
6 76 7 6 7
6 a( 1 , 2 ) a( 2 , 2 ) a( 3 , 2 ) a( 4 , 2 ) a( 5 , ˆ2 )
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ 7 6 u2 7 6 l( ˆ2 ) 7
6 76 7 6 7
6 a( ˆ1 , ˆ3 ) a( ˆ2 , ˆ3 ) a( ˆ3 , ˆ3 ) a( ˆ4 , ˆ3 ) a( ˆ5 , ˆ3 ) 7 6 u3 7 = 6 l( ˆ3 ) 7.
6 76 7 6 7
6 76 7 6 7
4 a( ˆ1 , ˆ4 ) a( ˆ2 , ˆ4 ) a( ˆ3 , ˆ4 ) a( ˆ4 , ˆ4 ) a( ˆ5 , ˆ4 ) 5 4 u4 5 4 l( ˆ4 ) 5
0 0 0 0 1 u5 1
Figure 5.1. FEM solution versus analytic solution for problem 5.32. FEM solutions computed on 5 isotropic elements
using a piecewise linear interpolation scheme.
5.6.2 Example B
In this example, we consider the equation describing the extension of an axially loaded
elastic bar. We follow the introduced steps for the FEM procedure and approximate the
problem using linear piecewise polynomials on two elements. By dividing the domain in
only two elements, we are able to derive the finite element solution by hand, and compare
with the analytical solution.
Figure 5.2. Axially loaded elastic bar of problem 5.49 and finite element discretization.
The extension u(x) of an axially loaded elastic bar of length L in figure 5.2 is described
by the following equation:
d2 u
A(x)E(x) + b(x) = 0, (5.48)
dx2
where A and E denote the cross-sectional area and Young’s modulus of the bar respectively
and b(x) represents the body force per unit axial length. The bar is constrained at its
left end (u(0) = 0) and the right end is subjected to a tensile force F EAu0 (L) = F .
Assuming that A, E and b are constant along the bar, the strong form of the problem is
described as follows.
8
>
> Mixed Strong Form: Find u 2 W 2,1 such that,
>
>
>
>
>
< AEu00 (x) + b = 0, for any x 2 ⌦ = [0, L]
(5.49)
>
> u(x = 0) = 0
>
>
>
>
>
: AE du |
dx x=L
=F
We note that in this case we have an ODE which must satisfy a Dirichlet boundary
condition on the left end and a Neumann condition on the right end. Comparing with
the general ODE in equation 5.1 , we see that the R, Q and f operators are:
This ODE can also be solved analytically leading to the following solution:
b 2 F + bL
u(x) = x + x, (5.50)
2AE AE
which will be used in later comparisons with the FEM solution.
Following the steps outlined in section 5.1, we derive the weak form of the problem by first
multiplying the strong form ODE in equation 5.49 by a test function v and integrating
over the domain ⌦, i.e.
Z L Z L
EAu00w · vdx + bvdx = 0, (5.51)
0 0
and apply the boundary conditions in equation 5.49. As described in example A, in order
to simplify the weak form of the problem, we restrict the solution uw and test function v
to belong in space W0 ,
W0 = {v 2 W | v(0) = 0}
,
The continuous weak form for this example may then be expressed as:
8
< Weak Form: Find uw 2 W0 such that,
RL RL (5.53)
: 0
EAu0w · v 0 dx + F v(L) = 0 bvdx, 8v 2 W0 .
In this example, we aim to approximate the solution using a piecewise linear continuous
approximation over two equally sized elements. Therefore, the domain ⌦ = [0, L] is
divided into two equal linear elements ek , which can be defined by their nodes as
k 1
ek = [xk , xk+1 ], xk = L, (5.54)
2
leading to the follwing mesh:
For the approximation of our solution we use two nodal Lagrange basis functions 1 (⇠) =
1 ⇠, 2 (⇠) = ⇠ which are defined on the master element eM = [0, 1]. Subsequently, the
global basis functions are constructed by mapping the local basis functions onto the two
elements of the mesh, using the mapping
L
pk : e M ! e k , pk (⇠) = h⇠ + xk , h= . (5.56)
2
Enforcing continuity between adjacent elements, we require three global basis functions.
The connectivity matrix is then defined by the mapping between element and global node
number as,
" #
1 2
T= . (5.57)
2 3
The discrete solution uh can be written as a weighted sum uh = u · ˆ of the global basis
functions ˆ, where
2 3 2 3
u1 ˆ1
6 7 ˆ=6 7
u = 4 u2 5 , 4 ˆ2 5 .
u3 ˆ3
n o
W h = vh : vh (x) = v · ˆ(x), x 2 ek 2 Th (⌦), for any v 2 R3 .
We can then derive the discrete weak form of the problem by re-writing the continuous
weak form and considering the problem on the subset W h ⇢ W :
(
Mixed Weak Form: Find uh 2 W0h such that,
R R (5.58)
⌦
EAu0h · vh0 dx + F vh (L) = ⌦ bvh dx, 8vh 2 W0h
where W0h = W h \ W0 denotes the discrete approximation space which satisfies the zero
Dirichlet boundary condition of the example.
The discrete weak form of the problem can be written in operator form as:
Z
a(uh , vh ) = EAu0h · vh0 dx (5.59)
⌦
Z
l(vh ) = bvh dx F vh (L) (5.60)
⌦
Breaking the system into operators can then be used to derived the matrix system
describing the problem.
Based on section 5.5, the global matrix or sti↵ness matrix A of the problem can be
computed as follows:
Z
Aij = a( ˆj , ˆi ) = EA ˆ0j · ˆ0i (5.61)
⌦
Z
Fi = l( ˆi ) = b ˆi dx F ˆi (L). (5.62)
⌦
Z
(A )ij = ae ( ˆn , ˆm ) =
e
EA ˆ0n · ˆ0m (5.63)
e
Z
(F )i = le ( ˆm ) =
e
b ˆm dx F ˆm (L), (5.64)
e
where the global indices m = T (e, i) and n = T (e, j) correspond to the local basis indices
(i, j) (where i and j can be 1 or 2). Finally, each computation can be considered on
the master element eM and be expressed in terms of the local basis functions 1 and 2 .
Note that in the specific example, integration is exact so no Gauss quadrature rule is
performed. Specifically,
Z
d ˆj (x) d ˆi (x)
Aei,j = EA · dx
e dx dx
d d d⇠
Chain rule (·) = (·)
dx d⇠ dx
Z ✓ ◆2
d ˆj (x) d ˆi (x) d⇠
= EA · dx
e d⇠ d⇠ dx
✓ ◆ 1
dpk d⇠ dpk 1
Substitution x = pk (⇠), dx = d⇠ = hd⇠ and noting = =
d⇠ dx d⇠ h
Z
d j (⇠) d i (⇠) 1
= EA · d⇠
eM d⇠ d⇠ h
Z
EA d j (⇠) d i (⇠)
= · d⇠
h eM d⇠ d⇠
Z
EA EA
Ae1,1 = ( 1) · ( 1)d⇠ =
h eM h
Z
EA EA
Ae1,2 = ( 1) · (1)d⇠ =
h eM h
Z
EA EA
Ae2,2 = (1) · (1)d⇠ = (5.65)
h eM h
" #
e EA 1 1
A = (5.66)
h 1 1
Z
bh
Fe1 = b(1 ⇠)hd⇠ F ˆ1 (L) = F ˆ1 (L)
eM 2
Z
bh
Fe2 = b⇠hd⇠ F ˆ2 (L) = F ˆ2 (L)
eM 2
" # " #
e bh 1 F ˆ1 (L)
F = (5.67)
2 1 F ˆ2 (L)
The Neumann boundary condition which is imposed on the right end of the domain,
has already been incorporated in the matrix system through the operator l. Note that
this boundary condition only applies on the node corresponding to x = L on the second
element and so the term F v h (L) corresponding to it is equal to F on that node and
zero everywhere else. Taking this fact into account along with the expressions for the
element-matrices Ae and F e , we derive the following matrix system:
2 32 3 2 3
1 0 0 u1 0
6 EA 76 7 6 7
4 h
2 EA
h
EA
h 5 4 u2 5 = 4 bh 5. (5.68)
EA EA bh
0 h h
u3 2
F
the 3 ⇥ 3 system reduces to a 2 ⇥ 2 system (as u1 = 0) which can be solved easily leading
to the following expressions for u2 and u3 :
3bh2 Fh 2bh2 2F h
u2 = + , u3 = + . (5.69)
2EA EA EA EA
Note, that the finite element solution is exact at the nodes of the domain, as can be
deduced from the analytical solution of the problem (equation 5.50).
Figures 5.3 and 5.4 compare the analytical and finite element solutions of the problem for
two discretizations of the domain when A = E = F = b = L = 1. Note that increasing
the number of elements improves the accuracy of the solution.
1.5
1
!uh , u!
0.5
u(solution)
u (approximation,p=1)
h
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Ω
Figure 5.3. Comparison between the analytical and finite element solution of problem 5.49 when the domain is divided
into two elements
1.5
1
!uh , u!
0.5
u(solution)
uh(approximation,p=1)
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Ω
Figure 5.4. Comparison between the analytical and finite element solution of problem 5.49 when the domain is divided
into five elements
where a and l are defined in equation 5.7 and 5.8. Supposing they also meet the continuity
and coercivity conditions set out in section 5.2, then we have the following result.
Lemma 5.2 (Error Analysis) Suppose a is a bilinear operator on the real Hilbert space
W h ⇢ W constructed from degree k polynomials and satisfies conditions 5.10 and 5.11
and l : W h ! R is linear on W h , then the solution uh 2 W h to equation 5.70 satisfies the
estimate, ⇣ ⌘
kuw uh kW +1 min kuw w h kW
↵ wh 2W h
Prior to proving this result, we point to the main point of this bound. From lemma 5.2,
we see that the error between our weak form solution and approximation computed using
the discrete weak form may be bounded above by the minimum error over the entire
polynomial space (multiplied by a constant)! This means, if we can construct an optimal
approximation (in the k · kW norm) which has convergence properties, we can observe
convergence in our FEM method. This result – and similar advanced results dealing with
more complex cases – provide the theoretical explanation for why the FEM method works.
Proof. To prove the error estimate, we note that both uw 2 W and uh 2 W h satisfy,
a(uh wh , uh wh ) = a(uw wh , uh wh )
kuw wh kW kuh w h kW , (5.73)
Combining the results in equations 5.73 and 5.74 and dividing by kuh wh kW ,