1 s2.0 S0377042714002350 Main

Journal of Computational and Applied Mathematics 272 (2014) 41–56
Contents lists available at ScienceDirect
Journal of Computational and Applied

Mathematics
journal homepage: www.elsevier.com/locate/cam
Variational time discretization methods for optimal control

problems governed by diffusion–convection–reaction
equations
Tuğba Akman ∗ , Bülent Karasözen
Department of Mathematics and Institute of Applied Mathematics, Middle East Technical University, 06800 Ankara, Turkey
article info abstract

Article history: In this paper, the distributed optimal control problem governed by unsteady diffu-
Received 3 February 2013 sion–convection–reaction equation without control constraints is studied. Time dis-
Received in revised form 2 May 2014 cretization is performed by variational discretization using continuous and discontinuous
Galerkin methods, while symmetric interior penalty Galerkin with upwinding is used for
MSC: space discretization. We investigate the commutativity properties of the optimize-then-
49N10
discretize and discretize-then-optimize approaches for the continuous and discontinuous
49K20
65M60
Galerkin time discretization. A priori error estimates are derived for fully-discrete state,
65M15 adjoint and control. The numerical results given for convection dominated problems via
optimize-then-discretize approach confirm the theoretically observed convergence rates.
Keywords: © 2014 Elsevier B.V. All rights reserved.
Optimal control problems
Unsteady diffusion–convection–reaction
equation
Variational time discretization
A priori error estimates
1. Introduction
Optimal control problems (OCPs) governed by diffusion–convection–reaction equations arise in environmental control
problems, optimal control of fluid flow and in many other applications. It is well known that the standard Galerkin finite
element discretization causes non-physical oscillating solutions when convection dominates. Stable and accurate numerical
solutions can be achieved by various effective stabilization techniques such as the streamline upwind/Petrov–Galerkin
(SUPG) finite element method [1], the local projection stabilization [2], the edge stabilization [3]. Recently, discontinuous
Galerkin (dG) methods have gained importance due to their better convergence behaviour, local mass conservation,
flexibility in approximating rough solutions on complicated meshes, mesh adaptation and weak imposition of the boundary
conditions, see, e.g., [4,5].
In this paper, we solve the OCP governed by diffusion–convection–reaction equation by applying symmetric interior
penalty Galerkin (SIPG) method with upwinding in space [6–8] and variational time discretization [9–14].
In recent years, most of the research is concentrated on parabolic OCPs (see for example [15,16]). There are few publica-
tions dealing with OCPs governed by non-stationary diffusion–convection–reaction equation. The local DG approximation
of the OCP which is discretized by backward Euler in time is studied in [17] and a priori error estimates for semi-discrete OCP
is provided in [18]. In [19], the characteristic finite element solution of the OCP is discussed and numerical results are pro-
vided. A-priori error estimates for discontinuous Galerkin time discretization for unconstrained parabolic OCPs are proposed
∗ Corresponding author. Tel.: +90 312 210 53 79; fax: +90 312 210 2985.
E-mail addresses: takman@metu.edu.tr (T. Akman), bulent@metu.edu.tr (B. Karasözen).
http://dx.doi.org/10.1016/j.cam.2014.05.002
0377-0427/© 2014 Elsevier B.V. All rights reserved.
42 T. Akman, B. Karasözen / Journal of Computational and Applied Mathematics 272 (2014) 41–56
in [20]. The control-constrained case is investigated in [21] for SIPG method combined with backward Euler discretization.
Crank–Nicolson time discretization is applied for OCP of diffusion–convection equation in [22]. To the best of our knowledge,
this is the first study on space–time DG discretization of OCPs governed by convection–diffusion–reaction equations.
There exist two different approaches for solving OCPs: optimize-then-discretize (OD) and discretize-then-optimize (DO).
In the OD approach, first the infinite dimensional optimality system is derived containing the state and adjoint equations
and the variational inequality. Then, the optimality system is discretized by using a suitable discretization method in space
and time. In DO approach, the infinite dimensional OCP is discretized and then the finite-dimensional optimality system
is derived. The DO and OD approaches do not commute in general for OCPs governed by diffusion–convection–reaction
equation [1]. However, commutativity is achieved in the case of SIPG discretization for steady state problems [4]. For
discontinuous Galerkin time discretization, where both trial and test spaces are discontinuous, we show that OD and DO
approaches commute, i.e. the adjoint state is discretized as we do for the state variable. For continuous Galerkin time
discretization, where the trial spaces are continuous and the test spaces are discontinuous, OD and DO approaches do
not commute. For a priori error estimates, we use the error analysis in [20] adapted to space–time discontinuous Galerkin
discretization. For this purpose, we divide the error analysis in three parts as in [19] using the error estimates for the dG
bilinear forms.
The rest of the paper is organized as follows. In Section 2, we define the model problem and then derive the optimality
system. In Section 3, we present the symmetric interior penalty Galerkin method with upwinding. In Section 4, we discuss
the fully discrete optimality system using variational time discretization methods. In Section 5, we give some auxiliary
results, which are needed for the a priori error estimates. In Section 6, we derive convergence estimates for the fully discrete
optimality system. In Section 7, the computational details of variational time discretization methods are investigated. In
Section 8, numerical results are shown in order to discover the performance of the suggested methods. The paper ends with
some conclusions.
2. The optimal control problem
We adopt the standard notations for Sobolev spaces on computational domains and their norms. Let Ω be a bounded
convex polygonal domain in R2 with Lipschitz boundary ∂ Ω . The inner product in L2 (Ω ) is denoted by (·, ·).
We consider the following distributed optimal control problem governed by the unsteady diffusion–convection–reaction
equation
 T
1
J (y, u) := ∥y − yd ∥2L2 (Ω ) + α ∥u∥2L2 (Ω ) dt ,
 
minimize (2.1a)
u∈L2 (0,T ;L2 (Ω )) 2 0
subject to ∂t y − ϵ 1y + β · ∇ y + ry = f + u (x, t ) ∈ Ω × (0, T ], (2.1b)

y(x, t ) = 0 (x, t ) ∈ ∂ Ω × [0, T ], (2.1c)
y(x, 0) = y0 (x) x ∈ Ω . (2.1d)
The source function and the desired state are denoted by f ∈ L2 (0, T ; L2 (Ω )) and yd ∈ L2 (0, T ; L2 (Ω )), respectively.
The initial condition is also defined as y0 (x) ∈ H01 (Ω ). The diffusion and reaction coefficients are ϵ > 0 and r ∈ L∞ (Ω ),
respectively. The velocity field β ∈ (W 1,∞ (Ω ))2 satisfies the incompressibility condition, i.e. ∇ · β = 0. Furthermore,
we assume the existence of the constant C0 such that r ≥ C0 a.e. in Ω so that the well-posedness of the optimal control
problem (2.1) is guaranteed. The trial and test spaces are Y = V = H01 (Ω ), ∀t ∈ (0, T ]. The OCP (2.1) problem is written in
variational form as follows
 T
1
minimize J (y, u) := ∥y − yd ∥2L2 (Ω ) + α∥u∥2L2 (Ω ) dt
 
(2.2a)
u∈L2 (0,T ;L2 (Ω )) 2 0
subject to (∂t y, v) + a(y, v) = (f + u, v), ∀v ∈ V , t ∈ I , (2.2b)

y(x, 0) = y0 , x ∈ Ω,
with
 
a(y, v) = (ϵ∇ y · ∇v + β · ∇ yv + ryv)dx, (w, v) = wv dx.
Ω Ω
It is well known that the functions (y, u) ∈ H 1 (0, T ; L2 (Ω )) ∩ L2 (0, T ; Y ) × L2 (0, T ; L2 (Ω )) solve (2.1) if and only if
there is an adjoint p ∈ H 1 (0, T ; L2 (Ω )) ∩ L2 (0, T ; Y ) such that (y, u, p) is the unique solution of the following optimality
system [23],
(∂t y, v) + a(y, v) = (f + u, v) ∀v ∈ V , y(x, 0) = y0 , (2.3a)
−(∂t p, ψ) + a(ψ, p) = −(y − yd , ψ) ∀ψ ∈ V , p(x, T ) = 0, (2.3b)
 T
(α u − p, w − u)dt = 0, ∀w ∈ L2 (0, T ; L2 (Ω )). (2.3c)
0
T. Akman, B. Karasözen / Journal of Computational and Applied Mathematics 272 (2014) 41–56 43
3. Symmetric interior penalty Galerkin semi-discretization
In this section, we briefly describe the interior penalty Galerkin semi-discretization in space. Let {Th }h be a family of
shape regular meshes such that Ω = ∪K ∈Th K , Ki ∩ Kj = ∅ for Ki , Kj ∈ Th , i ̸= j. The diameters of elements K are denoted
by hK . The maximum diameter is h = maxK ∈Th hK . In addition, the length of an edge E is denoted by hE .
We use discontinuous piecewise finite element spaces to define the discrete test, state and control spaces
Vh,p = Yh,p = Uh,p = y ∈ L2 (Ω ) : y |K ∈ Pp (K ) ∀K ∈ Th .

 
(3.1)
Here, P (K ) denotes the set of all polynomials on K ∈
p
Th of degree p. We split the set of all edges Eh into the set Eh0 of interior
edges and the set Eh∂ of boundary edges so that Eh = Eh∂ Eh0 . Let n denote the unit outward normal to Ω . We
∪ ∂ define the
inflow boundary
Γ − = {x ∈ ∂ Ω : β · n(x) < 0}
and the outflow boundary Γ + = ∂ Ω \ Γ − . The boundary edges are decomposed into edges Eh− = E ∈ Eh∂ : E ⊂ Γ −
 
that correspond to the inflow boundary and edges Eh+ = Eh∂ \ Eh− that correspond to the outflow boundary. The inflow and
outflow boundaries of an element K ∈ Th are defined by
∂ K − = {x ∈ ∂ K : β · nK (x) < 0} , ∂ K + = ∂ K \ ∂ K −,
where nK is the unit normal vector on the boundary ∂ K of an element K .
Let the edge E be a common edge for two elements K and K e . For a piecewise continuous scalar function y, there are two
traces of y along E, denoted by y|E from interior of K and ye |E from interior of K e . Then, the jump and average of y across the
edge E are defined by:
1
[[y]] = y|E nK + ye |E nK e , y|E + ye |E .

{{y}} = (3.2)
2
Similarly, for a piecewise continuous vector field ∇ y, the jump and average across an edge E are given by
1
[[∇ y]] = ∇ y|E · nK + ∇ ye |E · nK e , ∇ y|E + ∇ ye |E .

{{∇ y}} = (3.3)
2
For a boundary edge E ∈ K ∩ Γ , we set {{∇ y}} = ∇ y and [[y]] = yn where n is the outward normal unit vector on Γ .
The state equation (2.1) in space for fixed control u is discretized by the symmetric interior penalty method (SIPG) [6].
The convective term is discretized by the upwind method [8]. This leads to the following semi-discrete state equation
(∂t yh , vh ) + ash (yh , vh ) + bh (uh , vh ) = (f , vh ) ∀vh ∈ Vh,p , t ∈ (0, T ], (3.4)
with the (bi-)linear forms
Jσ (y,v)
    
  σ
ad (y, v) = ϵ∇ y · ∇v dx − {{ϵ∇ y}} · [[v]] + {{ϵ∇v}} · [[y]] − ϵ [[y]] · [[v]] ds (3.5)
K ∈Th K E ∈Eh E hE
and
   
ash (y, v) = ad (y, v) + β · ∇ yv + ryv dx + β · n(ye − y)v ds

K ∈Th K K ∈Th ∂ K − \Γ −
 
− β · nyv ds, (3.6)
K ∈Th ∂ K − ∩Γ −

bh (u, v) = − uv dx. (3.7)
K ∈Th K
The penalty parameter σ > 0 should be sufficiently large to ensure the stability of the dG discretization [7, Section 2.7.1]
with a lower bound depending only on the polynomial degree.
4. Variational time discretization
In this section, we formulate two classes of variational time discretization methods mentioned above for the OCP and
describe the commutativity properties of the DO and OD approaches.
The variational time discretization method was appeared first in [24] and developed by various papers; for a compre-
hensive description see [10, Chapter 12]. The test spaces always consist of piecewise discontinuous polynomials. When the
solution space consists of continuous piecewise polynomials of degree q + 1 and the test functions are piecewise discon-
tinuous polynomials of degree q, the resulting method is called continuous Galerkin–Petrov cGP(q + 1) method. For discon-
tinuous Galerkin dG(q) methods, both of the test and the trial spaces are piecewise discontinuous polynomials of degree q.
Advantages of variational time discretization are stability, nodal superconvergence, and applicability of space–time adap-
tivity. Both the continuous and discontinuous Galerkin methods are A-stable; the discontinuous Galerkin methods are even
L-stable (strongly stable). The convergence order of cGP(q+1) methods are of one order higher than the dG(q) methods. A pri-
ori error estimates of optimal order can be obtained with respect to the size of time steps [25], whereas dG methods require
less regular solutions than the cG methods. The time–space adaptivity can be easily implemented because the time and space
discretizations are treated similarly. Using the a posteriori error estimates, adaptive hp time stepping and dynamic meshes
(the use of different spatial discretization for each time step) can be directly incorporated in the discrete formulation [26].
Let 0 = t0 < t1 < · < tNT = T be a subdivision of I = (0, T ] with time intervals Im = (tm−1 , tm ] and time steps
km = tm − tm−1 for m = 1, . . . , NT and k = max1≤m≤NT km . We note that the same mesh is used at each time level tm for
m = 0, . . . , NT . Let fδ and ydδ be approximations of the source function f and the desired state function yd on each interval Im .
4.1. Continuous Galerkin–Petrov (cGP(q + 1)) method
We define the discontinuous test space

k,q
Vh,p = v ∈ L2 (I ; Vh,p ) : v|Im ∈ Pq (Im , Vh,p ), m = 1, . . . , NT , vm (0) ∈ L2 (Ω ) ,
 
(4.1)
and the continuous trial space as follows
k,q+1
= v ∈ C (Ī ; Vh,p ) : v|Im ∈ Pq+1 (Im , Vh,p ), m = 1, . . . , NT ,
 
Ṽh,p
where Pq (Im , Vh,p ) denotes the space of polynomials of degree q defined on Im with values on Vh,p . Then, the fully-discrete
optimal control problem is written as
 
 T
1 
∥yδ − yδ ∥L2 (K ) + α∥uδ ∥L2 (K ) dt ,
d 2 2

minimize (4.2a)
k,q+1 2
uδ ∈Ṽh,p 0 K ∈Th
 T  T
(∂t yδ , vδ ) + ash (yδ , vδ ) dt = (fδ + uδ , vδ ) dt , ∀vδ ∈ Vhk,,pq , yδ,0 = (y0 )δ .
 
subject to (4.2b)
0 0
k,q+1 k,q+1
The OCP (4.2) has a unique solution (yδ , uδ ) and that pair (yδ , uδ ) ∈ Ṽh,p × Ṽh,p is the solution of (4.2) if and only if
k,q+1 k,q+1 k,q+1 k,q+1
there is an adjoint pδ ∈ Ṽh,p such that (yδ , uδ , pδ ) ∈ Ṽh,p × Ṽh,p × Ṽh,p is the unique solution of the fully-discrete
optimality system [23]
 T  T
(∂t yδ , vδ ) + ash (yδ , vδ ) dt = (fδ + uδ , vδ ) dt , ∀vδ ∈ Vhk,,pq , yδ,0 = (y0 )δ ,
 
(4.3a)
0 0
 T  T
(−(∂t pδ , ψδ ) + ah (pδ , ψδ )) dt = − (yδ − ydδ , ψδ ) dt , ∀ψδ ∈ Vhk,,pq , pδ,N = 0, (4.3b)
0 0
 T
(α uδ − pδ , wδ − uδ ) dt = 0, ∀wδ ∈ Vhk,,pq . (4.3c)
0
4.1.1. Commutativity properties of cGP(q + 1) method

We derive the optimality system arising from DO approach, that also appears in [27, Section 3], and compare it with the
k,q+1 k,q+1 k,q
optimality system (4.3). We consider the discrete Lagrangian defined on Ṽh,p × Ṽh,p × Vh,p as follows
 
 T
1 
L(yδ , uδ , pδ ) = ydδ 2L2 (K ) + α∥ ∥ uδ 2L2 (K )

∥yδ − ∥ dt
2 0 K ∈Th
NT 
  
(∂t yδ , pδ ) + ash (yδ , pδ ) dt − (fδ + uδ , pδ )dt
 
+
m=1 Im Im
+ ((y0 )δ − yδ,0 , pδ,0 ).

We differentiate L with respect to yδ , apply integration by parts. We add and subtract (ψδ,NT , p+
δ,NT ). Then, on each
subinterval Im , the adjoint equation reads as
 
−(∂t pδ , ψδ ) + ash (ψδ , pδ ) dt − ([pδ ]m , ψδm ) = − (yδ − ydδ , ψδ ) dt , ∀pδ ∈ Vhk,,pq , ψδ ∈ Ṽhk,,pq+1 ,
 
(4.4)
Im Im
with p+
δ,N = 0. We observe that the temporal jump terms appear in (4.4), while it is not the case for the adjoint equation in
k,q+1 k,q
(4.3b). In addition, in OD approach pδ ∈ Ṽh,p , while in DO approach pδ ∈ Vh,p . Therefore, OD and DO approaches do not
commute for the cGP-method.
4.2. Discontinuous Galerkin (dG(q)) methods
k,q
We use (4.1) as a discontinuous test and trial space. We define the temporal jump of v ∈ Vh,p as [v]m = v+
m
− v−
m
, where
w± = limε→0± v(tm + ε). Then, the fully-discrete optimal control problem is written as
m
 
 T
1 
ydδ 2L2 (K ) + α∥ ∥ uδ 2L2 (K ) dt ,

minimize ∥yδ − ∥ (4.5a)
k,q 2
uδ ∈Vh,p 0 K ∈Th
NT 
  T NT

subject to (∂t yδ , vδ )dt + ash (yδ , vδ )dt + ([yδ ]m−1 , vδ,+
m−1
)
m=1 Im 0 m=1
 T
= (fδ + uδ , vδ ) dt , ∀vδ ∈ Vhk,,pq , y−
δ,0 = (y0 )δ . (4.5b)
0
k,q k,q
The OCP (4.5) has a unique solution (yδ , uδ ) and that pair (yδ , uδ ) ∈ Vh,p × Vh,p is the solution of (4.5) if and only if there
k,q k,q k ,q k,q
is an adjoint pδ ∈ Vh,p such that (yδ , uδ , pδ ) ∈ Vh,p × Vh,p × Vh,p is the unique solution of the fully-discrete optimality
system [23]
NT 
  T NT

(∂t yδ , vδ )dt + ash (yδ , vδ )dt + ([yδ ]m−1 , vδ,+
m−1
)
m=1 Im 0 m=1
 T
= (fδ + uδ , vδ ) dt , ∀vδ ∈ Vhk,,pq , y−
δ,0 = (y0 )δ , (4.6a)
0
NT 
  T NT

(−∂t pδ , ψδ )dt + ah (pδ , ψδ )dt − ([pδ ]m , ψδ,−
m
)
m=1 Im 0 m=1
 T
=− (yδ − ydδ , ψδ ) dt , ∀ψδ ∈ Vhk,,pq , p+
δ,N = 0, (4.6b)
0
 T
(α uδ − pδ , wδ − uδ ) dt = 0, ∀wδ ∈ Vhk,,pq . (4.6c)
0
4.2.1. Commutativity properties of dG(q) method

We derive the optimality system arising from DO approach and compare it with the optimality system (4.6). We construct
k,q k,q k,q
the discrete Lagrangian defined on Vh,p × Vh,p × Vh,p as follows
 
 T
1 
L(yδ , uδ , pδ ) = ∥yδ − yδ ∥L2 (K ) + α∥uδ ∥L2 (K ) dt
d 2 2

2 0 K ∈Th
NT
   
(∂t yδ , pδ ) + ash (yδ , pδ ) dt + ([yδ ]m−1 , pδ,+
m−1
)− (fδ + uδ , pδ )dt
 
+
m=1 Im Im
+ ((y0 )δ − δ,0 , pδ,0 ).

y− −
Differentiating L with respect to yδ and applying the same technique in (4.4), we obtain the adjoint equation on each time
interval Im
 
−(∂t pδ , ψδ ) + ash (ψδ , pδ ) dt − ([pδ ]m , ψδ,−
m
)=− (yδ − ydδ , ψδ ) dt , ∀pδ , ψδ ∈ Vhk,,pq .
 
Im Im
Similar to the cGP(q + 1)-method, p+ δ,N = 0. Now, we use commutativity of DG bilinear form provided in [4], i.e.,
ash (ψδ , pδ ) = ah (pδ , ψδ ). Thus, we arrive at (4.6b). Therefore, OD and DO approaches commute.
5. Some auxiliary results
In this section, firstly, we give some a priori error estimates in the literature and then we present the discrete charac-
teristic function that provides error estimates at arbitrary time points. Then, we prove some useful lemmas and state the
main estimate of this study, particularly for discontinuous Galerkin time discretization. All of these will be used in the next
section for a priori error estimates.
We introduce the L2 inner product on the inflow or outflow boundaries as follows

(w, v)Γ − = |β · n|wv ds
Γ−
with analogous definition of (·, ·)Γ + and associated norms ∥ · ∥Γ − and ∥ · ∥Γ + .

The broken Sobolev space is defined as
H k (Ω , Th ) = v : v |K ∈ H k (K ) ∀K ∈ Th ,
 
with the semi-norm defined by

 1/2

|v|H k (Ω ,Th ) = |v| 2
H k (K )
, v ∈ H k (Ω , Th ).
K ∈Th
The Bochner space of functions whose kth time derivative is bounded almost everywhere on (0, T ) with values in X is
denoted by W k,∞ (0, T ; X ). We use the DG energy norm in [11, Section 4]
|||v|||2DG = |v|2H 1 (Ω ,T ) + Jσ (v, v). (5.1)

h
We give the multiplicative trace inequality for all K ∈ Th , for all v ∈ H 1 (K ) as follows:
 
K ∥v∥L2 (K ) ,
∥v∥2L2 (∂ K ) ≤ CM ∥v∥L2 (K ) |v|H 1 (K ) + h− 1 2
(5.2)
where CM is a positive constant independent of v, h and K . We refer the reader to the study [28, Lemma 3.1] for the proof.
In addition, the generalization of Poincaré inequality to the broken Sobolev space H 1 (Ω , Th ) is given as [7, Section 3.1.4]
 
 1
∥v∥ 2
L2 (Ω )
≤ CS |v| 2
H 1 (Ω ,Th )
+ ∥[[y]]∥ 2
L2 (E )
. (5.3)
E ∈Eh
hE
We proceed with the standard estimates derived for finite element methods [29]. Consider the L2 -projection Πh :
L2 (Ω ) → Vh,p so that
∥Πh v − v∥L2 (K ) ≤ CΠ hp+1 |v|H p+1 (K ) , |Πh v − v|H 1 (K ) ≤ CΠ hp |v|H p+1 (K ) , (5.4)
for all v ∈ H (K ), K ∈ Th where CΠ is a positive constant and independent of v and h. In addition, as suggested in
p+1
[11, Section 4], using the study [30], the following estimate holds for all v ∈ H p+1 (Ω , Th )
|||Πh v − v|||DG ≤ (2CM + 1)CΠ hp |v|H p+1 (Ω ,Th ) , (5.5)
where CM and CΠ are positive constants from (5.2) and (5.4), respectively. In the following we introduce the parabolic
projection for m = 0, . . . , NT and mention the properties given in [11]. Suppose that X ⊂ L2 (Ω ) is a Hilbert space. Let us
denote the space of polynomial functions depending on time as follows:
α
 
α

P (Im , X ) = v ∈ L (0, T ; L (Ω )) : v =
2 2
t φs,m , t ∈ Im , φs,m ∈ X
s
.
s=0
k,q
A space–time projection π of y ∈ C (0, T ; H 1 (Ω )) into Vh,p is employed for the convergence estimates. Time projection
P of y ∈ C (0, T ; H 1 (Ω )) is defined as
Py ∈ v ∈ L2 (QT ) : v|Im ∈ P q (Im , L2 (Ω )) ,
 

(Py − y, t j v)dt = 0, ∀v ∈ L2 (Ω ), j = 0, . . . , q − 1,
Im
(Py)m
− = y(t ).
m
k,q
In addition, for m = 0, . . . , NT , with y ∈ C (0, T ; H 1 (Ω )), π y ∈ Vh,p is defined as
π y = Πh (Py) ⇐⇒ ((π y)(t ), v) = ((Py)(t ), v) , ∀v ∈ Vh,p , ∀t ∈ Im ,
 
(π y − y, v)dt = ((Py, v) − (y, v))dt = 0, ∀v ∈ Vhk,,pq−1 , (5.6)
Im Im
((π y)m
− − y(t ), v) = (((Py)− , v) − (y(t ), v)) = 0,
m m m
∀v ∈ Vh,p .
We note that the definition of the projection π is likewise in the study [31].
We give some estimates from [11, Lemmas 4.3, 4.5], which we need in the proofs.
Lemma 1. Suppose that y ∈ W q+1,∞ (Im , H 1 (Ω )) such that y = 0 on ∂ Ω . Then,

∥y(t ) − Py(t )∥ ≤ CP kqm+1 |y|W q+1,∞ (Im ,L2 (Ω )) ∀t ∈ Im ,
|y(t ) − Py(t )|H 1 (Ω ) ≤ CP kqm+1 |y|W q+1,∞ (Im ,H 1 (Ω )) ∀t ∈ Im , (5.7)
|||y(t ) − Py(t )|||DG ≤ CP kqm+1 |y|W q+1,∞ (Im ,H 1 (Ω )) ∀t ∈ Im .
Lemma 2. Suppose that y ∈ W q+1,∞ (Im , H 1 (Ω )) ∩ L∞ (Im , H p+1 (Ω )) such that y = 0 on ∂ Ω . Then,
∥y(t ) − π y(t )∥ ≤ Cπ (hp+1 + kqm+1 )∥y∥R ∀t ∈ Im ,
|||y(t ) − π y(t )|||DG ≤ Cπ (hp + kqm+1 )∥y∥R ∀t ∈ Im , (5.8)
where ∥y∥R = max(|y|W q+1,∞ (Im ,H 1 (Ω )) , |y|L∞ (Im ,H p+1 (Ω )) ) and Cπ is a positive constant independent of h, km , m and y.
Lemma 3. There exists a positive constant CA which is independent of h, vh , wh , ϵ such that
ad (y(t ) − Πh y(t ), vh ) ≤ CA ϵ hp ∥y(t )∥H p+1 (Ω ) |||vh |||DG , a.e. t ∈ (0, T ), y ∈ L2 (0, T ; H p+1 (Ω )), vh ∈ Vh,p , (5.9)
a (vh , wh ) ≤ CA ϵ|||vh |||DG |||wh |||DG ,
d
vh , wh ∈ Vh,p .
Proof. The proof in [32, Lemma 3.8] is adapted to the bilinear form (3.5) using the estimate (5.5).
Remark 4. A similar estimate for the bilinear form arising from the non-symmetric interior penalty Galerkin method can
be found in [11, Lemma 4.2].
Lemma 5. The bilinear form ad (·, ·) satisfies the coercivity inequality

ϵ
ad (vh , vh ) ≥ |||vh |||2DG , ∀vh ∈ Vh,p . (5.10)
2
Proof. The proof in [32, Corollary 3.10] is adopted to the bilinear form (3.5) using the norm (5.1).
5.1. Discrete characteristic function
We use the discrete characteristic function in order to provide error estimates at arbitrary time points as suggested
in [13]. We can work on [0, k) instead of Im , since the construction of the discrete characteristic function is invariant under
translation. We consider polynomials s ∈ Pq (0, k) and the discrete approximation of χ[0,t ) s of s which is a polynomial
 k  t
s̃ ∈ s̃ ∈ Pq (0, k) : s̃(0) = s(0) sz , ∀z ∈ Pq−1 (0, k).
 
such that s̃z =
0 0
k,q k,q
This definition can be extended from Pq (0, k) to Vh,p . The discrete approximation of χ[0,t ) v for v ∈ Vh,p is written as
ṽ = qi=0 s̃i (t )vi . On account of these inequalities, the following estimate is given in [11]

 
|||w̃|||2DG dt ≤ CD |||w|||2DG dt , CD = CD (q). (5.11)
Im Im
A suitable discrete approximation χ(t ,t n ] vh must be constructed for the adjoint problem, as it is noted in the proof of [20,
Theorem 3.8]. The discrete approximation of χ(t ,t NT ] s is a polynomial
 t NT  t NT
s̃ ∈ {s̃ ∈ Pq (t NT −1 , t NT ) : s̃(t NT ) = s(t NT )} such that s̃z = sz ,
t NT −1 t
k,q
∀z ∈ Pq−1 (t NT −1 , t NT ). This definition can be extended from Pq (t NT −1 , t NT ) to Vh,p and the estimates above can be modified
for the adjoint [20, Theorem 3.8].
6. A priori error estimates
We proceed with the derivation of convergence estimates for the optimality system and its space–time dG approxima-
tion. We define the auxiliary state and adjoint equation which are needed for a priori error analysis
NT   T NT  T
u,0
 
(∂t yuδ , vδ )dt + ash (yuδ , vδ )dt + ([yuδ ]m−1 , vδ,+
m−1
)= (fδ + u, vδ ) dt , yδ,− = (y0 )δ , (6.1a)
m=1 Im 0 m=1 0
NT   T NT  T
u,N
 
(−∂t pδ , ψδ )dt +
u
ah (pδ , ψδ )dt −
u
([pδ ]m , ψδ,− ) = −
u m
(yuδ − ydδ , ψδ ) dt , pδ,+ = 0. (6.1b)
m=1 Im 0 m=1 0
Following [33], we assume that the reaction term satisfies |r | ≤ Cr a.e. in Ω ; the velocity field is bounded by a constant
Cβ a.e. in Ω .
We shall prove some useful lemmas before stating the main theorem of this study.
Lemma 6. Let (yδ , pδ ) and (yuδ , puδ ) be the solutions of (4.6) and (6.1), respectively. Then, there exists a constant C independent
of h and k such that
 tn
sup ∥yuδ (t ) − yδ (t )∥ + sup ∥puδ (t ) − pδ (t )∥ ≤ C ∥u − uδ ∥dt . (6.2)
t ∈In t ∈In 0
Proof. Firstly, we shall study the fully discrete state equation on each subinterval Im . We subtract (4.6a) from (6.1a) to obtain
  
(∂t θ , vδ )dt + ([θ]m−1 , vδ,+ ) + m−1
ash (θ , vδ )dt = (u − uδ , vδ )dt , (6.3)
Im Im Im
where θ = yuδ − yδ . We substitute vδ = 2θ in (6.3). Then,


2(∂t θ , θ )dt + 2([θ ]m−1 , θ+
m−1
) = ∥θ−m ∥2 − ∥θ−m−1 ∥2 + ∥[θ ]m−1 ∥2 , (6.4)
Im
is achieved. For the right-hand side, we employ Cauchy–Schwarz, Young inequalities, Poincaré inequality (5.3) and
the definition of dG norm (5.1). For the left-hand side, we use (5.10) for diffusion term and follow the technique in
(see [33, Theorem 5.1]) for convection and reaction terms. Then, we derive the following estimate in the middle of (6.5)
ϵ
 
m 2 m−1 2 2
∥θ− ∥ − ∥θ− ∥ + |||θ ||| + 2C0 ∥θ∥2 dt DG dt
2 Im Im
   
ϵ 2 2 2

+ ∥θ ∥∂ K − ∩Γ − + ∥[[θ ]]∥∂ K − \Γ − + ∥θ ∥∂ K + ∩Γ + dt
2 Im K ∈Th
ϵ
 
m 2 m−1 2 2
≤ ∥θ− ∥ − ∥θ− ∥ + |||θ||| + 2C0 ∥θ ∥2 dt DG dt
2 Im Im
   

2 2 2
+ ∥θ∥∂ K − ∩Γ − + ∥[[θ]]∥∂ K − \Γ − + ∥θ∥∂ K + ∩Γ + dt
Im K ∈Th

≤C ∥u − uδ ∥2 dt . (6.5)
Im
We note that the lower bound on the left-hand side of (6.5) has been added after deriving the estimate in the middle
for the clearance of the proof and will be used later. Now, we proceed by substituting vδ = 2θ̃ into (6.3). We employ the
discrete characteristic function as in the proof of [11, Theorem 5.2] to obtain an estimate at arbitrary points and use the
properties given there. With z = arg supĪm ∥θ (t )∥, the discrete characteristic function defined in Section 5.1 leads to
  z
(∂t θ , θ̃ )dt = (∂t θ , θ )dt , θ̃+m−1 = θ+m−1 , [θ̃ ]m−1 = [θ]m−1 , (6.6)
Im tm−1

2(∂t θ , θ̃ )dt + 2([θ]m−1 , θ̃+
m−1
) = ∥θ (z )∥2 − ∥θ−m−1 ∥2 + ∥[θ ]m−1 ∥2 . (6.7)
Im
m−1
We use (6.6)–(6.7) and the inequality ∥θ− ∥ ≤ supt ∈Im−1 ∥θ (t )∥ to bound the terms arising in the time derivative.
We proceed by moving 2 I ah (θ , θ̃ )dt to the right-hand side. We employ (5.9) for the diffusion term, the proof of

m
[33, Theorem 5.1] for the convection term. The reaction term and the control on the right-hand side is bounded by using
Cauchy–Schwarz and Young inequalities (5.3) and (5.1) such that ∥ · ∥2 ≤ C ||| · |||2DG is satisfied for a positive constant C . We
2
eliminate the term |||θ̃|||DG on the right-hand side by using (5.11). Then, we obtain the following inequality
    
sup ∥θ (t )∥2 − sup ∥θ (t )∥2 ≤ Cb |||θ|||2DG dt + ∥θ∥2∂ K + ∩Γ + + ∥[[θ]]∥2∂ K − \Γ − dt + C ∥u − uδ ∥2 dt
t ∈Im t ∈Im−1 Im Im K ∈T Im
h
   
 
≤ Cb′ |||θ|||2DG + ∥θ∥2∂ K + ∩Γ + + ∥[[θ]]∥2∂ K − \Γ − dt + C ∥u − uδ ∥2 dt , (6.8)
Im K ∈Th Im
where Cb = C (1 + CD )(ϵ CA + CS (Cr + Cβ )), Cb′ = max{1, Cb }. In order to eliminate the terms θ on the right-hand side of
(6.8), we use (6.5) multiplying it by Cb′′ = 2ϵ Cb′ . By adding these inequalities and denoting Θm = supt ∈Im ∥θ (t )∥2 + Cb′′ ∥θ−
m 2
∥ ,
we arrive at

Θm − Θm−1 ≤ C (1 + Cb′′ ) ∥u − uδ ∥2 dt . (6.9)
Im
We sum (6.9) over m = 1, . . . , n ≤ NT and use θ = 0 at t = 0 to derive the estimate

 tn
sup ∥θ (t )∥2 = sup ∥yuδ (t ) − yδ (t )∥2 ≤ C ∥u − uδ ∥2 dt . (6.10)
t ∈In t ∈In 0
Secondly, we proceed with the adjoint equation subtracting (4.6b) from (6.1b) and using ζ = puδ − pδ . A discrete
approximation to χ(t ,tm ] vh specified for the adjoint problem must be used, as we discussed in Section 5.1. Then, this leads to

2(−∂t ζ , ζ̃ )dt − 2([ζ ]m , ζ̃−m ) = ∥ζ (z )∥2 − ∥ζ m ∥2 + ∥[ζ ]m ∥2 , (6.11)
Im
where z = arg supĪm ∥ζ (t )∥. In addition, the inequalities ∥ζ m ∥2 ≤ supIN ∥ζ (t )∥2 and ∥ζ (z )∥2 = supIN ∥ζ (t )∥2
T −m+2 T −m+1
are needed. Then, we follow the same idea used to derive (6.10) to reach the inequality

sup ∥ζ (t )∥ − 2
sup ∥ζ (t )∥ ≤ Ckm
2
∥u − uδ ∥2 dt . (6.12)
t ∈IN −m+1 t ∈IN −m+2 t ∈Im
T T
We shall sum (6.12) over m = NT , . . . , n ≥ 1 and use ζ = 0 at t = tNT . The final result (6.2) follows from standard
algebra, (6.10) and (6.12).
We shall proceed with the estimate between the exact and the approximate control.
Lemma 7. Let (y, p, u) and (yδ , pδ , uδ ) be the solutions of (2.3) and (4.6), respectively. Then, we have
1
∥u − uδ ∥L2 (0,T ;L2 (Ω )) ≤ ∥p − puδ ∥L2 (0,T ;L2 (Ω )) . (6.13)
α
Proof. We apply the technique used for the steady-state optimal control problem in [4, Section 4.2]. We start using the
continuous and fully-discrete optimality conditions (2.3c)–(4.6c) to obtain the following equation
 T
α∥u − uδ ∥2L2 (0,T ;L2 (Ω )) = α (u − uδ , u − uδ )dt
0
 T  T  T
= (α u − p, u − uδ )dt − (α uδ − pδ , u − uδ )dt + (p − pδ , u − uδ )dt
0 0 0
 T  T
= (p − puδ , u − uδ )dt + (puδ − pδ , u − uδ )dt = J1 + J2 . (6.14)
0 0
We use Cauchy–Schwarz and Young inequalities to show that
1 α
0 ≤ J1 ≤ ∥p − puδ ∥2L2 (0,T ;L2 (Ω )) + ∥u − uδ ∥2L2 (0,T ;L2 (Ω )) . (6.15)
2α 2
We proceed with J2 and use the auxiliary state equation (6.1) to obtain
 T
J2 = (puδ − pδ , u − uδ ) dt
0
NT 
  T N

(∂t (yuδ − yδ ), puδ − pδ ) dt + ash (yuδ − yδ , puδ − pδ ) dt + [yuδ − yδ ]m−1 , (puδ − pδ )+
m−1
.
 
=
m=1 Im 0 m=1
We proceed applying integration by parts in time and use the auxiliary adjoint equation (6.1) to arrive at
NT 
 N

puδ − pδ , ∂t (yuδ − yδ ) dt + yuδ − yδ , puδ − pδ |tm
t
   
J2 = − m−1
m=1 Im m=1
 T N

ash (yuδ − yδ , puδ − pδ ) dt + [yuδ − yδ ]m−1 , (puδ − pδ )+
m−1
 
+
0 m=1
NT 
  T N

puδ − pδ , ∂t (yuδ − yδ ) dt + ash (yuδ − yδ , puδ − pδ ) dt − (yδ − yδ )m
− , [pδ − pδ ]m
   u u

=−
m=1 Im 0 m=1
 T
yuδ − yδ , yuδ − yδ dt ≤ 0.
 
=− (6.16)
0
Then, using (6.14)–(6.16), we derive the final result (6.13).
Lemma 8. Let (y, p) be the solutions of (2.3), respectively and (yuδ , puδ ) be the solutions of the auxiliary equations (6.1),
respectively. Assume that y, p ∈ W q+1,∞ (0, T ; H 1 (Ω )) ∩ L∞ (0, T ; H p+1 (Ω )). Then, there exists a constant C independent
of h and k such that
sup ∥y − yuδ ∥ + sup ∥p − puδ ∥ ≤ O (hp + kq+1 ). (6.17)

t ∈In t ∈In
Proof. Firstly, we integrate (2.3a) over Im and subtract the result from (6.1a) in order to obtain the following equation
    
(∂t ξ , vδ )dt + ([ξ ]m−1 , vδ,+
m−1
)+ ash (ξ , vδ )dt = − (∂t η, vδ )dt + ([η]m−1 , vδ,+
m−1
) − ah (η, vδ )dt , (6.18)
Im Im Im Im
where y − yδ = (y − π y) + (π y − yδ ) = η + ξ .
u u
Since we use the same mesh on each time interval, (5.6) leads to the following identity.

(∂t η, vδ )dt + ([η]m−1 , vδ,+
m−1
) = 0, ∀vδ ∈ Vhk,q . (6.19)
Im
We proceed as in the proof of Lemma 6 and the proof of [33, Theorem 5.1] by inserting the estimate (5.8) to obtain
 
(∂t ξ , vδ )dt + ([ξ ]m−1 , vδ,+m−1
)+ ash (ξ , vδ )dt
Im Im
ϵ
   
C0 1
|||vδ |||2DG dt + ∥vδ ∥2 dt + ∥vδ ∥2∂ K + ∩Γ + + ∥[[vδ ]]∥2∂ K − \Γ − dt
 
≤
4 Im 2 Im 2 Im K ∈T
h
Cβ Cr
+ km CA Cπ (h2p + k2q+2 )|y|2R + km 2Cβ Cπ CM (h2p+1 + k2q+2 )|y|2R + km Cπ (h2p+2 + k2q+2 )|y|2R , (6.20)
C0
where |y|R = max(|y|W q+1,∞ (Im ;H 1 (Ω )) , |y|L∞ (Im ;H p+1 (Ω )) ).
Firstly, we shall substitute vδ = 2ξ into (6.20) to obtain
ϵ
 
m 2 m−1 2 2
∥ξ− ∥ − ∥ξ− ∥ + |||ξ ||| + C0 ∥ξ ∥2 dt DG dt
2 Im Im
  
1 1
+ ∥ξ ∥2∂ K − ∩Γ − + ∥[[ξ ]]∥2∂ K − \Γ − + ∥ξ ∥2∂ K + ∩Γ + dt
Im K ∈T 2 2
h
≤ km Cb (h2p + h2p+1 + h2p+2 + k2q+2 )|y|2R , (6.21)

Cβ Cr
where Cb = max{CA Cπ , 2Cβ Cπ CM , Cπ C0
}.
Secondly, we substitute vδ = 2ξ̃ into (6.20) to obtain
   
sup ∥ξ (t )∥2 − sup ∥ξ (t )∥2 ≤ Cb′ |||ξ |||2DG dt + ∥[[ξ ]]∥2∂ K − \Γ − + ∥ξ ∥2∂ K + ∩Γ + dt
t ∈Im t ∈Im−1 Im Im K ∈T
h
+ km Cb (h + h + h2p+2 + k2q+2 )|y|2R

2p 2p+1
  
 
′′ 2 2 2
≤ Cb |||ξ |||DG + ∥[[ξ ]]∥∂ K − \Γ − + ∥ξ ∥∂ K + ∩Γ + dt
Im K ∈Th
+ km Cb (h + h 2p 2p+1
+ h2p+2 + k2q+2 )|y|2R , (6.22)
where Cb = C (1 + CD )(ϵ CA + CS (Cβ + Cr )), Cb = max{1, Cb }. Now, we proceed as in the proof of Lemma 6. We multiply
′ ′′ ′
(6.21) by Cb′′′ = 2ϵ Cb′′ in order to eliminate the terms ξ on the right-hand side of (6.22). Then, we add it to (6.22) and denote
Θm = supt ∈Im ∥ξ (t )∥2 + Cb′′′ ∥ξ−m ∥2 in order to obtain
Θm − Θm−1 ≤ km 2Cb′′′ (h2p + h2p+1 + h2p+2 + k2q+2 )|y|2R . (6.23)

We sum (6.23) over m = 1, . . . , n ≤ NT to obtain
sup ∥ξ (t )∥2 ≤ O (h2p + k2q+2 ). (6.24)

t ∈In
Thirdly, we integrate (2.3b) over Im and subtract it from (6.1b) and denote p − puδ = (p − π p) + (π p − puδ ) = ϕ + µ.
Then, we use the idea in the proof of (6.24) in order to derive
sup ∥µ(t )∥2 − sup ∥µ(t )∥2 ≤ Ckm sup ∥ξ (t )∥2 dt + O (h2p + k2q+2 ), (6.25)
t ∈IN −m+1 t ∈IN −m+2 t ∈Im
for C > 0. The resulting inequality is summed over m = NT , . . . , n ≥ 1. Then, it is combined with (6.24) to derive the final
result (6.17).
Remark 9. For guaranteeing the assumptions on the exact solution, it is necessary to require a higher regularity of the data
of the problem.
We state the main estimate of this study by combining Lemmas 6–8.
Theorem 10. Suppose that (y, p, u) and (yδ , pδ , uδ ) are the solutions of (2.3) and (4.6), respectively. We assume that all
conditions of Lemmas 6–8 are satisfied. Then, there exists a constant C independent of h and k such that
∥y − yδ ∥L∞ (0,T ;L2 (Ω )) + ∥p − pδ ∥L∞ (0,T ;L2 (Ω )) + ∥u − uδ ∥L2 (0,T ;L2 (Ω )) ≤ C hp + kq+1 .
 
(6.26)
In Theorem 10, the error in the state and control is measured with respect to the norm L∞ (0, T ; L2 (Ω )) and
L2 (0, T ; L2 (Ω )), respectively. The same norms are used, for example, in [34], too. The former norm is due to the discrete
characteristic function which is used to provide error estimates at arbitrary time points. The latter norm arises from the
optimality condition which is shown in Lemma 7. On the other hand, we observe that Theorem 10 is optimal in time,
suboptimal in space in the L∞ (0, T ; L2 (Ω )) norm for the state and L2 (0, T ; L2 (Ω )) for the control, i.e. O (hp , kq+1 ), using
p-degree spatial, q-degree temporal polynomial approximation. However, for example, optimal spatial convergence rate for
SIPG discretization combined with backward Euler is achieved using an elliptic projection in [21]. The first reason behind the
order reduction in this study is the estimate (5.8) for the space–time projection which is employed to bound the continuity
estimate of the bilinear form in Lemma 3. The convection term also has an influence on the spatial order reduction since we
follow the proof of [33, Theorem 5.1]. In order to improve this suboptimal estimate, the effect of the space–time projector
in the bilinear form of the diffusion term must be eliminated.
7. Computational aspects of variational time discretization
In this section, we will apply the cGP(2) and dG(1) methods [14,35] to the OCP (2.3) and derive the fully discrete
formulations of the adjoint equation. The cGP(2) and dG(1) methods are convergent of order 3 and 2, respectively. The
cGP(2) method is super-convergent of order 4, and dG(1) is of order 3 at the nodal points, i.e. at the endpoints of the time
intervals [10,14].
7.1. cGP(2)-method
In the cG(2)-method, the state yδ and the control uδ are approximated in the time interval Im = (tm−1 , tm ] by
yδ = Ym0 φm,0 (t ) + Ym1 φm,1 (t ) + Ym2 φm,2 (t ), ∀t ∈ Im ,
uδ = 0
Um φm,0 (t ) + 1
Um φm,1 (t ) + 2
Um φm,2 (t ), ∀t ∈ Im ,
where φm,j ∈ P2 (Im ) are the orthogonal quadratic Lagrange basis functions on Īm .
The discrete time steps on Īm are chosen according to the 3-point Gauss–Lobatto rule on Im ,
tm,0 = tm−1 , tm,1 = (tm−1 + tm )/2, tm,2 = tm ,
with the reference weights ŵ0 = ŵ2 = 1/3, ŵ1 = 4/3.
The test functions are chosen as ψms
,i ∈ P1 (Im ) such that
3
ψ̂is (t̂µ ) = (ŵµ )−1 δi,µ i, µ = 1, 2, ⇐⇒ ψ̂1s = (1 − t̂ ), ψ̂2s = 3t̂ .
4
Using the transformation Tn : Î → Im where Î = (−1, 1]
tm−1 + tm
km
t = Tn (t̂ ) = + t̂ ∈ Im , ∀t̂ ∈ Î , m = 1, . . . , NT ,
2 2
they are transformed to the interval Im .
The initial conditions on each time interval Im are
Ym0 = yδ (tm−1 ) if m ≥ 2 or Ym0 = y0 if m = 0.
For the state equation, on each time interval Im , the following linear system has to be solved for Ym1 , Ym2
 
km 1
M + As M  Ym1
 
2 4
 km  2
Ym
−4M 2M + As
2
     
5 km km 1 km 1
 4 M− As Ym0 + M 0
Um + Um1 + Fh (tm,0 ) + Fh (tm,1 ) 
4  2 2 2 2 .
= 
km s km km
(7.1)
−2M + A Ym + M (Um − Um ) + (Fh (tm,2 ) − Fh (tm,0 ))
0 2 0
 
2 2 2
The solution of state at the t = tm is given by yh,m = Ym2 .
The cGP(2) method for solving the adjoint equation is constructed in a similar way as for the state equation, but by
integrating backwards. Let pδ be the approximate solution of the adjoint
0
pδ = Pm φm,0 (t ) + Pm1 φm,1 (t ) + Pm2 φm,2 (t ), ∀t ∈ Im .
The discrete time steps on Īm are chosen according to the 3-point Gauss–Lobatto rule on Im ,
tm,0 = tm−1 , tm,1 = (tm−1 + tm )/2, tm,2 = tm ,
with the reference weights ŵ0 = ŵ2 = 1/3, ŵ1 = 4/3.
The test functions are chosen as ψm
a
3
ψ̂ia (t̂µ ) = (ŵµ )−1 δi,µ i, µ = 0, 1, ⇐⇒ ψ̂1a = −3t̂ , ψ̂2a = (1 + t̂ ).
4
The initial conditions on each time interval Im are
2
Pm = pδ |Im−1 (tm ) if m ≤ 2NT − 1 2
or Pm =0 if m = 2NT − 1.
The system of equations to be solved for the adjoint equation becomes
 
km 1
M + Aa M  Pm1
 
2 4
 km  0
Pm
−4M 2M + Aa
2
     
5 km km 1 km 1
 4M − 4 A − M + a
+ 2
Pm (tm,2 ) + (tm,1 )  Ym2 Ym1 Yhd Yhd
=   2 2 2 2 . (7.2)
km km km
−2M + Aa Pm2 − M (Ym0 − Ym2 ) + (Yhd (tm,0 ) − Yhd (tm,2 ))
 
2 2 2
0
The solution at the t = tm−1 is given as ph,m−1 = Pm .
7.2. dG(1)-method
For the dG(1)-method, the state yδ and the control uδ are approximated in the time interval Im = (tm−1 , tm ] by
yδ = Ym1 φm
s
,1 (t ) + Ym φm,2 (t ),
2 s
∀ t ∈ Im ,
uδ = 0
Um φ a
m,0 (t ) + 1
Um φ a
m,1 (t ), ∀ t ∈ Im ,
using the Gauss–Radau quadrature rule, where φm
s
,j ∈ P1 (Im ) are the linear Lagrange basis functions on Īm .
The discrete time steps on Īm are chosen according to the right-handed 2-point Gauss–Radau rule on Im
km
s
tm ,1 = tm−1 + , s
tm ,2 = tm ,
3
we use the reference weights
ŵ1s = 3/2, ŵ2s = 1/2.
The test functions are chosen as ψm
s
1 − t̂ 3t̂ + 1
ψ̂is (t̂µ ) = (ŵµs )−1 δi,µ i, µ = 1, 2, ⇐⇒ ψ̂1s = , ψ̂2s = .
2 2
The initial conditions for state equation on each time interval Im are
Ym0 = (yδ )−
m−1 if m ≥ 2 or (yδ )− −
m−1 = y0 if m = 0.
On each time interval Im = (tm−1 , tm ], we solve the following linear system for Ym1 , Ym2
km km
   
3 km s 1
MYm0 + Fh (tm,1 ) + M (Um
0
+ Um1 )
4M + 2 A M Ym1
 
2 4
4 = . (7.3)
  
 9 5 km s

Ym2 km km
− M M+ A − MYm0 + Fh (tm,2 ) + M( 1
3Um − 0
Um )
4 4 2 2 4
Again the discrete state at t = tm is given as yh,tm = Ym2 .
Let pδ be the approximate solution of the adjoint
φm,0 (t ) + Pm1 φma ,1 (t ),
0 a
p δ = Pm ∀t ∈ Im ,
using linear orthogonal Lagrange functions and Gauss–Radau points. We note that the integrals in the variational formulation
of the state equation are approximated using the right-handed 2-point Gauss–Radau rule whereas left-handed Gauss–Radau
points are used for the adjoint equation.
The discrete time steps on Īm are chosen according to the left-handed 2-point Gauss–Radau rule
km
,0 = tm−1 , ,
a a
tm tm ,1 = tm−1 +
3
we use the corresponding reference weights
ŵ0a = 1/2, ŵ1a = 3/2.
The test functions are chosen ψm
a
1 − 3t̂ 1 + t̂
ψ̂0a = , ψ̂1a = .
2 2
The initial conditions for the adjoint equation on each time interval Im are
2
Pm = (pδ )+
m if m ≤ NT 2
or Pm =0 if m = NT .
On each time interval Im = (tm−1 , tm ], we solve the following linear system for Pm
1
, Pm0
km km
   
3 km 1
a 2
MPm + Yhd (tm,1 ) − M (Ym1 + Ym2 )
 
4M + 2 A M 1
Pm 2 4
4 = . (7.4)
  
 9 5 km a
 0
Pm km km
− M M+ A − 2
MPm + Yhd (tm,0 ) − M( 3Ym1 − Ym2 )
4 4 2 2 4
0
Similar to the cGP(q + 1) method, the discrete adjoint at tm−1 is given as ph,m−1 = Pm .
The main drawback of the dG time discretization is the solution of large coupled linear systems in block form. Because
we are using constant time steps, the coupled matrices on the right-hand sides of (7.1)–(7.4) have to be decomposed
(LU block factorization) at the beginning of the integration. Then, the state and adjoint equations are solved at each time
step by forward elimination and back substitution using the block factorized matrices. The advantage of the variational
time integration methods above is that only one of the variables is needed in the coupled system of equations (7.1)–(7.4) to
determine the discrete state and adjoint. The form of the linear systems (7.1)–(7.4) are the same, which is not the case when
arbitrary test functions ψ s and ψ a are used. Using this technique, the preconditioner given in [35] can be applied both to
the state and the adjoint equations. Additionally, the orthogonal test functions lead to sparse matrices with approximately
half of the non-zero entries than of variational time discretization methods with nodal basis test functions [36].
8. Numerical results
In this section, we present some numerical results. We measure the error in the state and control approximation in
terms of L∞ (0, 1; L2 (Ω )) and L2 (0, 1; L2 (Ω )) norm, respectively. We have used discontinuous piecewise linear polynomials
in space. In all numerical examples, we have taken h = O (k). We follow optimize-then-discretize approach.
Example 1. The first test problem is a convection dominated problem with smooth solutions depending implicitly on the
diffusion term
Q = (0, 1] × Ω , Ω = (0, 1)2 , ϵ = 10−5 , β = (1, 1)T , r =1 and α = 1.
The source function f , the desired state yd and the initial condition y0 are computed from (2.3) using the following exact
solutions of the state and control, respectively,
y(x, t ) = 50 exp(−t )xy(x − 1)(y − 1) cos(5x − 5y − 5),
u(x, t ) = 100 exp(−t )(1 − t )xy(x − 1)(y − 1) sin(5x + 5y − 5).
In Figs. 1–2, we present the numerical solutions and the errors for the state and the control at t = 0.5. We observe that
the problem is approximated well and the error is equally-distributed over the whole domain.
Fig. 1. The computed solutions of the state (left), error (right) for Example 1 at t = 0.5 with h = k = 1/80 by dG(1) method.
Fig. 2. The computed solutions of control (left), error (right) for Example 1 at t = 0.5 with h = k = 1/80 by dG(1) method.
Table 1
Example 1 by cGP(2) and dG(1) (in parenthesis) method.
h=k ∥y − yδ ∥ Rate ∥ u − uδ ∥ Rate
1
5
7.43e−2(8.01e−2) –(–) 1.59e−1(2.66e−1) –(–)
1
10
1.98e−2(2.16e−2) 1.91(1.89) 4.84e−2(6.21e−2) 1.71(2.10)
1
20
5.18e−3(5.63e−3) 1.93(1.94) 1.31e−2(1.57e−2) 1.88(1.98)
1
40
1.35e−3(1.43e−3) 1.94(1.98) 3.41e−3(3.63e−3) 1.94(2.11)
1
80
3.43e−4(3.61e−4) 1.97(1.99) 8.90e−4(9.18e−4) 1.94(1.98)
In Table 1, we give the errors for cGP(2) and dG(1) methods; the cGP(2) method yields smaller error than the dG(1)
method. For the cGP(2) method, theoretical convergence rate O (h2 , k3 ) leads to O (h2 ) with h = k. We achieve this rate
numerically. For the dG(1) method, the numerical results indicate a higher order experimental order of convergence, i.e.
O (h2 ), than the one shown in Theorem 10, which is O (h) with h = k.
Example 2. This example is a convection dominated OCP constructed from Example 2 in [34] by adding the reaction term.
Q = (0, 1] × Ω , Ω = (0, 1)2 , ϵ = 10−5 , β = (0.5, 0.5)T , r = 3, α = 1.

The source function f , the desired state yd and the initial condition y0 are computed from (2.3) using the following exact
solutions of the control and state, respectively,
−1 + cos(tx )
 
u(x1 , x2 , t ) = sin(π t ) sin(2π x1 ) sin(2π x2 ) exp √ ,
ε
√
ε
 
1 1
y(x1 , x2 , t ) = u √ sin(tx ) + 8επ 2 + cos(tx ) − sin2 (tx )
2 ε 2  2
−1 + cos(tx )

− π cos(π t ) sin(2π x1 ) sin(2π x2 ) exp √ ,
ε
Fig. 3. The computed solutions of the state (left), error (right) for Example 2 at t = 0.5 with h = k = 1/80 by dG(1) method.
Fig. 4. The computed solutions of control (left), error (right) for Example 2 at t = 0.5 with h = k = 1/80 by dG(1) method.
Table 2
Example 2 by cGP(2) and dG(1) (in parenthesis) method.
h=k ∥y − yδ ∥ Rate ∥u − uδ ∥ Rate
1
5
8.08e−1(1.18) –(–) 7.44e−2(1.04e−1) –(–)
1
10
2.96e−1(4.08e−1) 1.45(1.54) 2.46e−2(3.50e−2) 1.60(1.58)
1
20
8.95e−2(1.10e−1) 1.72(1.89) 6.90e−3(1.12e−2) 1.83(1.65)
1
40
2.39e−2(2.60e−2) 1.91(2.08) 1.41e−3(2.05e−3) 2.29(2.44)
1
80
6.09e−3(6.25e−3) 1.97(2.06) 3.07e−4(3.67e−4) 2.20(2.48)
where tx = t − 0.5(x1 + x2 ). As opposed to the previous example, the exact solution of PDE constrained depends on the
diffusion explicitly and the problem is highly convection dominated.
In Figs. 3–4, we present the numerical solution and the error between the exact and the numerical solution for state and
control at t = 0.5. We observe that the problem is approximated well. As expected, the error is more prominent on the
regions where the gradient of the solution is higher.
In Table 2, we present the error for cGP(2) and dG(1) methods. The cGP(2) method yields smaller error than the dG(1)
method because its rate of convergence is cubic in time. The second order convergence rates are achieved with both methods.
For the dG(1) method, the numerical results indicate a higher order experimental order of convergence than the one shown
in Theorem 10; it is even higher than quadratic for the control.
9. Conclusions
The numerical results show better convergence rates than the predicted ones with the a priori error estimates. For dG
time discretization, DO and OD approaches commute. In a future work, we will study derivation of the optimal convergence
rates under lower regularity assumptions and we will apply space–time adaptivity for problems with interior and boundary
layers.
Acknowledgements
The authors wish to thank the referees for their helpful suggestions and comments. This research was supported by the
Middle East Technical University Research Fund Project (BAP-07-05-2012-102).
References
[1] S.S. Collis, M. Heinkenschloss, Analysis of the streamline upwind/Petrov Galerkin method applied to the solution of optimal control problems,
in: Technical Report TR02-01, Department of Computational and Applied Mathematics, Rice University, Houston, TX 77005-1892, 2002.
[2] R. Becker, B. Vexler, Optimal control of the convection–diffusion equation using stabilized finite element methods, Numer. Math. 106 (2007) 349–367.
[3] M. Hinze, N. Yan, Z. Zhou, Variational discretization for optimal control governed by convection dominated diffusion equations, J. Comput. Math. 27
(2009) 237–253.
[4] D. Leykekhman, Investigation of commutative properties of discontinuous Galerkin methods in PDE constrained optimal control problems, J. Sci.
Comput. 53 (2012) 483–511.
[5] D. Leykekhman, M. Heinkenschloss, Local error analysis of discontinuous Galerkin methods for advection-dominated elliptic linear-quadratic optimal
control problems, SIAM J. Numer. Anal. 50 (2012) 2012–2038.
[6] D.N. Arnold, F. Brezzi, B. Cockburn, L.D. Marini, Unified analysis of discontinuous Galerkin methods for elliptic problems, SIAM J. Numer. Anal. 39
(2001–2002) 1749–1779.
[7] B. Rivière, Discontinuous Galerkin Methods for Solving Elliptic and Parabolic Equations: Theory and Implementation, in: Frontiers in Applied
Mathematics, vol. 35, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2008.
[8] B. Ayuso, L.D. Marini, Discontinuous Galerkin methods for advection–diffusion–reaction problems, SIAM J. Numer. Anal. 47 (2009) 1391–1420.
[9] K. Eriksson, C. Johnson, V. Thomée, Time discretization of parabolic problems by the discontinuous Galerkin method, RAIRO Modél. Math. Anal. Numér.
19 (1985) 611–643.
[10] V. Thomée, Galerkin Finite Element Methods for Parabolic Problems, second ed., in: Springer Series in Computational Mathematics, vol. 25,
Springer-Verlag, Berlin, 2006.
[11] M. Vlasák, V. Dolejší, J. Hájek, A priori error estimates of an extrapolated space–time discontinuous Galerkin method for nonlinear
convection–diffusion problems, Numer. Methods Partial Differential Equations 27 (2011) 1456–1482.
[12] M. Feistauer, V. Kučera, K. Najzar, J. Prokopová, Analysis of space–time discontinuous Galerkin method for nonlinear convection–diffusion problems,
Numer. Math. 117 (2011) 251–288.
[13] K. Chrysafinos, N.J. Walkington, Error estimates for the discontinuous Galerkin methods for parabolic equations, SIAM J. Numer. Anal. 44 (2006)
349–366 (electronic).
[14] G. Matthies, F. Schieweck, Higher order variational time discretizations for nonlinear systems of ordinary differential equations, Preprint 23/2011,
Fakultät für Mathematik, Otto-von-Guericke-Universität Magdeburg, 2011.
[15] D. Meidner, B. Vexler, A priori error estimates for space–time finite element discretization of parabolic optimal control problems. II. Problems with
control constraints, SIAM J. Control Optim. 47 (2008) 1301–1329.
[16] T. Apel, T.G. Flaig, Crank–Nicolson schemes for optimal control problems with evolution equations, SIAM J. Numer. Anal. 50 (2012) 1484–1512.
[17] Z. Zhou, N. Yan, The local discontinuous Galerkin method for optimal control problem governed by convection diffusion equations, Int. J. Numer. Anal.
Model. 7 (2010) 681–699.
[18] T. Sun, Discontinuous Galerkin finite element method with interior penalties for convection diffusion optimal control problem, Int. J. Numer. Anal.
Model. 7 (2010) 87–107.
[19] H. Fu, H. Rui, A priori error estimates for optimal control problems governed by transient advection–diffusion equations, J. Sci. Comput. 38 (2009)
290–315.
[20] K. Chrysafinos, Discontinuous Galerkin approximations for distributed optimal control problems constrained by parabolic PDE’s, Int. J. Numer. Anal.
Model. 4 (2007) 690–712.
[21] T. Akman, H. Yücel, B. Karasözen, A priori error analysis of the upwind symmetric interior penalty Galerkin (SIPG) method for the optimal control
problems governed by unsteady convection diffusion equations, Comput. Optim. Appl. 57 (2014) 703–729.
[22] E. Burman, Crank–Nicolson finite element methods using symmetric stabilization with an application to optimal control problems subject to transient
advection–diffusion equations, Commun. Math. Sci. 9 (2011) 319–329.
[23] F. Tröltzsch, Optimal Control of Partial Differential Equations: Theory, Methods and Applications, in: Graduate Studies in Mathematics, vol. 112,
American Mathematical Society, Providence, RI, 2010, Translated from the 2005 German original by Jürgen Sprekels.
[24] P. Jamet, Galerkin-type approximations which are discontinuous in time for parabolic equations in a variable domain, SIAM J. Numer. Anal. 15 (1978)
912–928.
[25] K. Eriksson, C. Johnson, Adaptive finite element methods for parabolic problems. II. Optimal error estimates in L∞ L2 and L∞ L∞ , SIAM J. Numer. Anal.
32 (1995) 706–740.
[26] T. Richter, A. Springer, B. Vexler, Efficient numerical realization of discontinuous Galerkin methods for temporal discretization of parabolic problems,
Numer. Math. 124 (2013) 151–182.
[27] R. Becker, D. Meidner, B. Vexler, Efficient numerical solution of parabolic optimization problems by finite element methods, Optim. Methods Softw.
22 (2007) 813–833.
[28] V. Dolejší, M. Feistauer, C. Schwab, A finite volume discontinuous Galerkin scheme for nonlinear convection–diffusion problems, Calcolo 39 (2002)
1–40.
[29] P.G. Ciarlet, The Finite Element Method for Elliptic Problems, North-Holland, Amsterdam, New York, 1978.
[30] V. Dolejší, M. Feistauer, V. Sobotíková, Analysis of the discontinuous Galerkin method for nonlinear convection–diffusion problems, Comput. Methods
Appl. Mech. Engrg. 194 (2005) 2709–2733.
[31] D. Schötzau, C. Schwab, Time discretization of parabolic problems by the hp-version of the discontinuous Galerkin finite element method, SIAM J.
Numer. Anal. 38 (2000) 837–875.
[32] V. Dolejší, M. Feistauer, Error estimates of the discontinuous Galerkin method for nonlinear nonstationary convection–diffusion problems, Numer.
Funct. Anal. Optim. 26 (2005) 349–383.
[33] M. Feistauer, K. Švadlenka, Discontinuous Galerkin method of lines for solving nonstationary singularly perturbed linear problems, J. Numer. Math.
12 (2004) 97–117.
[34] H. Fu, A characteristic finite element method for optimal control problems governed by convection–diffusion equations, J. Comput. Appl. Math. 235
(2010) 825–836.
[35] S. Basting, S. Weller, Efficient Preconditioning of Variational Time Discretization Methods for Parabolic Partial Differential Equations, in: Friedrich-
Alexander University, Erlangen, Department Mathematik, 2013.
[36] D.J. Estep, R.W. Freund, Using Krylov-subspace iterations in discontinuous Galerkin methods for nonlinear reaction–diffusion systems,
in: Discontinuous Galerkin Methods (Newport, RI, 1999), in: Lect. Notes Comput. Sci. Eng., vol. 11, Springer, Berlin, 2000, pp. 327–335.

1 s2.0 S0377042714002350 Main

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0377042714002350 Main

Uploaded by

Copyright:

Available Formats

Journal of Computational and Applied Mathematics 272 (2014) 41–56

Contents lists available at ScienceDirect

Journal of Computational and Applied

Variational time discretization methods for optimal control

article info abstract

2. The optimal control problem

subject to ∂t y − ϵ 1y + β · ∇ y + ry = f + u (x, t ) ∈ Ω × (0, T ], (2.1b)

subject to (∂t y, v) + a(y, v) = (f + u, v), ∀v ∈ V , t ∈ I , (2.2b)

3. Symmetric interior penalty Galerkin semi-discretization

Vh,p = Yh,p = Uh,p = y ∈ L2 (Ω ) : y |K ∈ Pp (K ) ∀K ∈ Th .

4. Variational time discretization

4.1. Continuous Galerkin–Petrov (cGP(q + 1)) method

We define the discontinuous test space

4.1.1. Commutativity properties of cGP(q + 1) method

+ ((y0 )δ − yδ,0 , pδ,0 ).

4.2. Discontinuous Galerkin (dG(q)) methods

4.2.1. Commutativity properties of dG(q) method

+ ((y0 )δ − δ,0 , pδ,0 ).

5. Some auxiliary results

with analogous definition of (·, ·)Γ + and associated norms ∥ · ∥Γ − and ∥ · ∥Γ + .

with the semi-norm defined by

|||v|||2DG = |v|2H 1 (Ω ,T ) + Jσ (v, v). (5.1)

Lemma 1. Suppose that y ∈ W q+1,∞ (Im , H 1 (Ω )) such that y = 0 on ∂ Ω . Then,

Lemma 3. There exists a positive constant CA which is independent of h, vh , wh , ϵ such that

Lemma 5. The bilinear form ad (·, ·) satisfies the coercivity inequality

5.1. Discrete characteristic function

6. A priori error estimates

where θ = yuδ − yδ . We substitute vδ = 2θ in (6.3). Then,

We sum (6.9) over m = 1, . . . , n ≤ NT and use θ = 0 at t = 0 to derive the estimate

We use Cauchy–Schwarz and Young inequalities to show that

Then, using (6.14)–(6.16), we derive the final result (6.13).

sup ∥y − yuδ ∥ + sup ∥p − puδ ∥ ≤ O (hp + kq+1 ). (6.17)

≤ km Cb (h2p + h2p+1 + h2p+2 + k2q+2 )|y|2R , (6.21)

+ km Cb (h + h + h2p+2 + k2q+2 )|y|2R

Θm − Θm−1 ≤ km 2Cb′′′ (h2p + h2p+1 + h2p+2 + k2q+2 )|y|2R . (6.23)

We sum (6.23) over m = 1, . . . , n ≤ NT to obtain

sup ∥ξ (t )∥2 ≤ O (h2p + k2q+2 ). (6.24)

7. Computational aspects of variational time discretization

Q = (0, 1] × Ω , Ω = (0, 1)2 , ϵ = 10−5 , β = (0.5, 0.5)T , r = 3, α = 1.

You might also like