Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Encoding-Independent Optimization Problem Formulation for Quantum Computing

Federico Dominguez,1, ∗ Josua Unger,2 Matthias Traube,1


Barry Mant,2 Christian Ertler,1 and Wolfgang Lechner1, 2, 3, †
1
Parity Quantum Computing Germany GmbH, Ludwigstraße 8, 80539 Munich, Germany
2
Parity Quantum Computing GmbH, A-6020 Innsbruck, Austria
3
Institute for Theoretical Physics, University of Innsbruck, A-6020 Innsbruck, Austria
(Dated: February 9, 2023)
We present an encoding and hardware-independent formulation of optimization problems for
quantum computing. Using this generalized approach, we present an extensive library of optimiza-
tion problems and their various derived spin encodings. Common building blocks that serve as a
construction kit for building these spin Hamiltonians are identified. This paves the way towards
a fully automatic construction of Hamiltonians for arbitrary discrete optimization problems. The
presented freedom in the problem formulation is a key step for tailoring optimal spin Hamiltonians
arXiv:2302.03711v1 [quant-ph] 7 Feb 2023

for different hardware platforms.

CONTENTS 2. Traveling salesperson problem (TSP) 18


3. Machine scheduling 18
I. Introduction 2 4. Nurse Scheduling problem 19
D. Real variables problems 20
II. Definitions and Notation 3 1. Financial crash problem 20
2. Continuous black box optimization 21
III. Encodings library 3 3. Nurse scheduling with real parameters 21
A. Binary encoding 4 E. Other problems 22
B. Gray encoding 5 1. Syndrome decoding problem 22
C. One-hot encoding 5 2. k-SAT 22
D. Domain-wall encoding 5 3. Improvement of Matrix Multiplication 23
E. Unary encoding 6
F. Block encodings 6 VIII. Summary of building blocks 24
A. Auxiliary functions 24
IV. Parity Architecture 7 1. Element counting 24
2. Step function 24
V. Encoding constraints 8 3. Minimizing the maximum element of a
set 24
VI. Encoding-Independent Formulation 10 4. Compare variables 25
B. Constraints 25
VII. Problem library 11 1. All (connected) variables are different 25
A. Partitioning problems 11 2. Value α is used 25
1. Clustering problem 13 3. Inequalities 25
2. Number partitioning 13 4. Constraint preserving driver 25
3. Graph coloring 13 C. Problem specific representations 25
4. Graph partitioning 14 1. Modulo 2 linear programming 26
5. Clique cover 15 2. Representation of boolean functions 26
B. Constrained subset problems 15 D. Meta optimization 26
1. Cliques 15 1. Auxiliary variables 26
2. Maximal independent set 16 2. Prioritization of cost function terms 27
3. Set packing 16 3. Problem conversion 27
4. Vertex cover 16
5. Minimal maximal matching 16 IX. Conclusion and Outlook 27
6. Set cover 17
7. Knapsack 17 X. Acknowledgements 28
C. Permutation problems 17
1. Hamiltonian cycles 18 References 28

∗ f.dominguez@parityqc.com
† wolfgang@parityqc.com
2

I. INTRODUCTION Parity architecture [29]. Moreover, experimental results


have shown that alternative encodings outperform the
Discrete optimization problems are ubiquitous in al- traditional one-hot approach [30–33]. It is clear that al-
most any field of human endeavor and many of these are ternative formulations for the Hamiltonians need to be
known to be NP-hard. The objective in such a discrete explored further, but when the Hamiltonian has been ex-
optimization problem is to find the minimum of a real val- pressed in QUBO using one-hot encoding, it is not triv-
ued function f (v0 , . . . , vN −1 ) (the cost function) over a ial to switch to other formulations. Automatic tools to
set of discrete variables vk . The search space is restricted explore different formulations would therefore be highly
by hard constraints, which commonly are presented as beneficial.
equalities such as g(v0 , . . . , vN −1 ) = 0 or inequalities as We present here a library of problems intended to
h(v0 , . . . , vN −1 ) > 0. facilitate the Hamiltonian formulation beyond QUBO
Besides using classical heuristics [1, 2] and machine and one-hot encoding. Common problems in the lit-
learning methods [3] to solve these problems there is erature are revisited and reformulated using encoding-
also a growing interest in applying quantum computa- independent formulations, meaning that they can be en-
tion. One large class for realizing this consists of first coded trivially using any spin encoding. The possible
encoding the cost function f in a Hamiltonian H such constraints of the problem are also identified and pre-
that a subset of eigenvectors (in particular the ground sented separately from the cost function, so the dynamic
state) of H represents elements in the domain of f and implementation of the constraints can also be easily ex-
the eigenvalues are the respective values of f : plored. Encoding-independent formulations have also
been suggested recently [28] as an intermediate represen-
H |v0 , . . . , vN −1 i = f (v0 , . . . , vN −1 ) |v0 , . . . , vN −1 i . (1) tation stage of the problems, and exemplified with some
use cases. Here, we extend this approach to more than
In such an encoding, the ground state of H is the solution
20 problems and provide a summary of the most popular
of the optimization problem. Having obtained an Hamil-
encodings that can be used. Two additional subgoals are
tonian reformulation, one can use a variety of quantum
addressed in this library:
algorithms to find the ground state of the Hamiltonian in-
cluding adiabatic quantum computing [4] and variational • Meta parameters/choices: We present and review
approaches, for instance the quantum/classical hybrid the most important choices that can be made in
quantum approximate optimization algorithm (QAOA) the process of mapping the optimization problem
[5] or generalizations thereof like the quantum alternat- in a mathematical formulation to a spin Hamilto-
ing operator ansatz [6]. On the hardware side these algo- nian. This mainly includes the encodings, which
rithms can run on gate-based quantum computers, quan- can greatly influence key characteristics important
tum annealers or specialized Ising-machines [7]. for the computational cost and performance of the
In the current literature, almost all Hamiltonians for optimization, but also free meta parameters or the
optimization are formulated as Quadratic Unconstrained use of auxiliary variables. All these degrees of free-
Binary Optimization (QUBO) problems [8]. The suc- dom are ultimately a consequence of the fact that
cess of QUBO reflects the strong hardware limitations, the optimal solution is typically encoded only in the
where higher-than-quadratic interactions are not avail- ground state and other low-energy eigenstates en-
able. Moreover, quantum algorithms with dynamical im- code good approximations to the optimal solution.
plementation of hard constraints [9, 10] require driver Furthermore, it can be convenient to make approx-
terms that can be difficult to design and also to imple- imations so that the solution corresponds not ex-
ment on quantum computers. Hence, hard constraints actly to the ground state but another low-energy
are usually included as energy penalizations of QUBO state as for example reported in [34].
Hamiltonians. The prevalence of QUBO has also in-
creased the popularity of one-hot encoding, a particu- • (partial) automation: Usually, each problem needs
lar way of mapping (discrete) variables to eigenvalues of to be evaluated individually. The resulting cost
spin operators [11], since this encoding allows for Hamil- functions are not necessarily unique and there is
tonians with low-order interactions which is especially no known trivial way of automatically creating H.
appropriate for QUBO problems. By providing a collection of building blocks of cost
However, compelling alternatives to QUBO and one- functions and heuristics for selecting parameters,
hot encoding have been proposed in recent years. A the creation of the cost function and constraints
growing number of platforms are exploring high-order in- can be assisted. This enables a general representa-
teractions [12–20], while the Parity architecture [21–23] tion of problems in an encoding-independent way
(a generalization of the LHZ architecture [24]) allows the and parts of the parameter selection and perfor-
mapping of arbitrary-order interactions to qubits that re- mance analysis can be conducted in this intermedi-
quire only local connectivity. The dynamical implemen- ate stage.
tation of constraints has also been further investigated
[6, 25–27], including the design of approximate drivers In practice, many optimization problems are not purely
[28] and compilation of constrained problems within the discrete but involve real-valued parameters and variables.
3

Thus, also the encoding of real-valued problems to dis- for some k ∈ R and a real valued function
crete optimization problems (discretization) as an inter- c(v0 , · · · , vN −1 ), or it is an inequality
mediate step is discussed in Sec. VII D.
The document is structured as follows. After intro- c(v0 , · · · , vN −1 ) ≤ k . (5)
ducing the notation used throughout the text in Sec. II,
we present a list of encodings in Sec. III. Sec. VI then The goal for an optimization problem O = (V, f, C) is
functions as a manual on how to bring the optimization to find an extreme value yex of f , such that all of the
problems contained in this document into a form that constraints are satisfied at yex .
can be solved with a quantum computer and two explicit Discrete optimization problems can often be stated
examples are used as illustration. Sec. VII contains a li- in terms of graphs or hypergraphs. A graph is a pair
brary of optimization problems which are classified into (V, E), where V is a finite set of vertices or nodes and
several categories and Sec. VIII lists building blocks used E ⊂ {{vi , vj } | vi 6= vj ∈ V } is the set of edges. An ele-
in the formulation of these problems. The building blocks ment {vi , vj } ∈ E is called an edge between vertex vi and
are also useful to handle many further problems. Finally, vertex vj . Note that a graph defined like this can neither
in Sec. IX we summarize the results and give an outlook have loops, i.e., edges beginning and ending at the same
on future projects. vertex, nor can there be multiple edges between the same
pair of vertices. Given a graph G = (V, E), its adjacency
matrix is the symmetric binary |V | × |V | matrix A with
II. DEFINITIONS AND NOTATION entries
(
In this section we give some basic definitions and settle 1, if {vi , vj } ∈ E,
Aij = (6)
the notation we will use throughout the whole text. 0, else .
A discrete set of real numbers is a countable subset
U ⊂ R not having an accumulation point. Discrete sets A hypergraph is a generalization of a graph in which
will be denoted by uppercase latin letters, except the we allow edges to be adjacent to more than two vertices.
letter G, which we reserve for graphs. A discrete vari- That is, a hypergraph is a pair H = (V, E), where V is a
able is a variable ranging over a discrete set R. Discrete finite set of vertices and
variables will be represented by lowercase latin letters,
mostly v or w. The discrete set R a discrete variable v |V |
takes values in, is called its range. Elements of R are
[
E⊆ {vj1 , · · · , vji } | vjk ∈ V and vjk 6= vj`
denoted by lowercase greek letters. i=1
(7)
If a discrete variable has range given by {0, 1} we call
it binary or boolean. Binary variables will be denoted by ∀ k, ` = 1, · · · i
the letter x. Similarly, a variable with range {−1, 1} will is the set of hyperedges.
be called a spin variable and the letter s will be reserved Throughout the whole paper we reserve the word qubit
for spin variables. There is an invertible mapping from a for actual physical qubits. To get from an encoding-
binary variable x to a spin variable s: independent Hamiltonian to a quantum program, binary
x 7→ s := 2x − 1 . (2) or spin variables become Pauli-z-matrices which act on
the corresponding qubits.
For a variable v with range R we define the value indica-
tor function to be
( III. ENCODINGS LIBRARY
α 1 if v = α
δv = (3)
0 if v 6= α, For many important problems, the cost function and the
problem constraints can be represented in terms of two
where α ∈ R. fundamental building blocks: the value of the integer
We will also consider optimization problems for con- variable v and the value indicator δvα defined in Eq. (3).
tinuous variables. A variable v will be called continuous, When expressed in terms of these building blocks, Hamil-
if its range is given by Rd for some d ∈ Z>0 . tonians are more compact and recurring terms can be
An optimization problem O is a triple (V, f, C), where identified across many different problems. Moreover,
1. V := {vi }i=0,··· ,N −1 is a finite set of variables. quantum operators are not present at this stage: an
encoding-independent Hamiltonian is just a cost func-
2. f := f (v0 , · · · , vN −1 ) is a real valued function, tion in terms of discrete variables, which eases access to
called objective or cost function. quantum optimization to a wider audience. The encod-
3. C = {Ci }i=0,··· ,l is a finite set of constraints Ci . A ing of the variables and the choice of quantum algorithms
constraint C is either an equation can be done at later stages.
The representation of the building blocks in terms of
c(v0 , · · · , vN −1 ) = k (4) Ising operators depends on the encoding we choose. An
4

encoding is a function that associates eigenvectors of the where


σz operator with specific values of a discrete variable v:
si = 2xi − 1 (13)
|s0 , . . . , sN −1 i → v, si = ±1, (8) are the corresponding Ising variables. Thus, the maxi-
α
(i) mum order of the interaction  terms in δv scales linearly
where the spin variables si are the eigenvalues of σz op- D
with D and there are k terms of order k. The to-
erators. The encodings are also usually defined in terms
tal number of terms is k D
P  D
of binary variables xi = 0, 1, which are related to Ising k = 2 . Since D binary
variables si = −1, 1 according to Eq. (2). variables encode a 2D -value variable, then the number of
A summary of encodings is presented in Fig. 1. Some interaction terms needed for a value indicator in binary
encodings are dense, in the sense that every quantum encoding scales linearly with the size of the variable.
state |s0 , . . . , sN −1 i encodes some value of the variable v. If v ∈ {1, . . . , K} with 2D < K < 2D+1 , D ∈ N, then
Other encodings are sparse, because only a subset of the we require D + 1 binary variables to represent v and we
possible quantum states are valid states. The valid subset will have 2D+1 − K invalid quantum states that do not
is generated by adding a core term in the Hamiltonian represent any value of the variable v. The set of invalid
for every sparsely encoded variable. In general, dense states is
encodings require fewer qubits, but sparse encodings have
( D )
X
i
simpler expressions for the value indicator δvα , and are R = |xi 2 xi + 1 > K . (14)

therefore favorable for avoiding higher-order interactions.
i=0
This is because δvα needs to check a smaller number of For rejecting quantum states in R, we force v ≤ K, which
qubit states to know whether the variable v has value α can be accomplished by adding a core term Hcore in the
or not, whereas dense encodings need to know the state Hamiltonian
of every qubit in the register [28, 33]. D+1
2X
Hcore = δvα , (15)
A. Binary encoding α=K+1

or imposing the sum constraint


Binary encoding uses the binary representation for en- D+1
2X
coding
 integer variables. Given an integer variable
 ccore = δvα = 0 (16)
v ∈ 1, 2D , we use D binary variables xi ∈ {0, 1} to rep-
α=K+1
resent v:
The core term penalizes any state that represents an in-
D−1
valid value for variable v. Because core terms impose
X an additional energy scale, the performance can reduce
v= 2i xi + 1. (9)
when K 6= 2D , D ∈ N. In some cases, such as the Knap-
i=0
sack problem, penalties for invalid states can be included
The value indicator δvα can be written using the generic in the cost function, so there is no need to add a core
expression term or constraints [32] (see also Sec. VIII B 3).
Y v−i When encoding variables which can take on negative
δvα = , (10) values as well, e.g. v ∈ {−K, −K + 1, ..., K 0 − 1, K 0 }, in
α−i classical computing one often uses an extra bit that en-
i6=α
codes the sign of the value. However, this might not be
which is valid for every encoding. The expression for the best option for our purposes because one spin flip
δvα in terms of boolean variables xi can be written could then change the value substantially and we do not
using that the value of α is codified in the bitstring assume full fault tolerance. There is a more suitable en-
(xα,0 . . . , xα,D−1 ). The value indicator δvα checks if the coding of negative values. Let us consider, e.g., the bi-
D binary variables xi are equal to xα,i to know if the nary encoding. Instead of the usual encoding, we can
variable v has the value α or not. We note that simply shift the values
(
1 if x1 = x2 D−1
x1 (2x2 − 1) − x2 + 1 = (11)
X
0 if x1 6= x2 v= 2i xi + 1 − K, (17)
i=0
and so we write where 2D−1 < K +K 0 ≤ 2D . The expression for the value
D−1
Y indicator functions stays the same, only the encoding of
δvα = [xi (2xα,i − 1) − xα,i + 1] the value α has to be adjusted. An additional advan-
i=0 tage compared to using the sign bit is that ranges not
(12) symmetrical around zero (K 6= K 0 ) can be encoded more
D−1  
Y 1 1 efficiently. The same approach of shifting the variable by
= si xα,i − + ,
i=0
2 2 −K can also be used for the other encodings.
5

B. Gray encoding stricted to the subspace defined by


N −1
In binary representation, a single spin flip can lead to a
X
ccore = xα − 1 = 0. (20)
sharp change in the value of v, for example |1000i codifies α=0
v = 9 while |0000i codifies v = 1. To avoid this, Gray
encoding reorders the binary representation in a way that One option to impose this sum constraint is to encode it
two consecutive values of v always differ in a single spin as an energy penalization with a core term in the Hamil-
flip. If we line up the potential values v ∈ [1, 2D ] of an tonian:
integer variable in a vertical sequence, this encoding in !2
N −1
D boolean variables can be described as follows: on the X
i-th boolean variable (which is the i-th column from the Hcore = 1 − xα , (21)
α=0
right) the sequence starts with 2i−1 zeros and continues
with an alternating sequence of 2i 1s and 2i 0s. As an which has minimum energy if only one xα is different
example consider from zero.
An alternative way to impose the core term can be for-
1: 0000 mulated as follows. In the building blocks representation
2: 0001 of boolean functions (Sec. VIII C 2) the representation
3: 0011 of the XOR(x0 , x1 ) function is given as XOR(x0 , x1 ) =
x0 + x1 − 2x0 x1 . In terms of spin variables this is
4: 0010
(1 − s0 s1 )/2. Likewise, concatenating the XOR function
5: 0110 yields
6: 0111
XOR(x0 , XOR(x1 , ...XOR(xN −2 , xN −1 )...) =
...
−1
N
!
1 N −1
Y
where D = 4. On the left-hand side of each row of 1 − (−1) si . (22)
2
boolean variables we have the value v. If we e.g. track i=0
the right-most boolean variable we indeed find that it
This can be used with an extra penalty term that lifts the
starts with 21−1 = 1 zero for the first value, 21 ones for
degeneracy to carry out the one-hot check, i.e. to check
the second and third value, 21 zeros for the third and
whether exactly one of the xi is equal to 1. For example,
fourth value and so on.
consider N = 3. Then XOR(x0 , XOR(x1 , x2 )) = 1 for
The value indicator function and the core term remain the desired configurations where exactly one of the xi is
unchanged except that the representation of for exam- one but also if all three are equal to one. In fact, any odd
ple α in the analog of Eq. (12) also has to be in Gray number of ones will result in a one in general. Defining
encoding.
An advantage of this encoding with regard to quan- Hcore = − XOR(x0 , XOR(x1 , ...XOR(xN −2 , xN −1 )...)
tum algorithms is that single spin flips do not cause large N −2
changes in the cost function and thus smaller coefficients +α
X
xi + 1 − α (23)
may be chosen (see discussion in Sec. VIII).
i=0

with 0 < α < 1 is then a valid one-hot check which


C. One-hot encoding requires N terms when expressed as spin variables 1 . On
the other hand, the standard one-hot check of Eq. (20)
One-hot encoding is a sparse encoding that uses N bi- requires O(N 2 ) terms which are at most of quadratic
nary variables xα to encode an N -valued variable v. The order.
encoding is defined by its variable indicator:
D. Domain-wall encoding
δvα = xα , (18)

which means that v = α if xα = 1. The value of v is This encoding uses the position of a domain wall in an
given by Ising chain to codify values of a variable v. If the end-
points of an N + 1 spin chain are fixed in opposite states,
N
X −1
v= αxα . (19)
α=0 1 Note that we only need N − 1 instead of N terms from the
summation since any configuration that makes the XOR chain
The physically meaningful quantum states are those with return one and is not a valid one-hot string has at least three
a single qubit in state 1 and so dynamics must be re- ones in it.
6

there must be at least one domain wall in the chain. Since so N − 1 binary variables xi are needed for encoding
the energy of a ferromagnetic Ising chain depends only an N -value variable. Unary encoding does not require a
on the number of domain walls it has and not on where core term because every quantum state is a valid state.
they are located, an N + 1 spin chain with fixed opposite However, this encoding is not unique in the sense that
endpoints has N possible ground states, depending on each value of v has multiple representations.
the position of the single domain wall. A drawback of unary encoding (and every dense en-
The codification of a variable v = 1, . . . , N using do- coding) is that it requires information from all binary
main wall encoding requires the core Hamiltonian [30]: variables to determine the value of v. The value indica-
" N −2
# tor δvα is
X
Hcore = − −s1 + sα sα+1 + sN −1 . (24)
α=1 Y v−i
δvα = , (29)
Since the fixed endpoints of the chain do not need a spin α−i
i6=α
representation (s0 = −1 and sN = 1), N − 1 Ising vari-
−1
ables {si }N
i=1 are sufficient for encoding a variable of N which involves 2N interaction terms. This scaling is com-
values. The minimum energy of Hcore is 2−N , so the core pletely unfavorable, so unary encoding may be only con-
term can be alternatively encoded as a sum constraint: venient for variables that do not require value indicators
" N −2
# δvα , but only the variable value v.
X A performance comparison for the Knapsack problem
ccore = − −s1 + sα sα+1 + sN −1 = 2 − N. (25)
using digital annealers showed that unary encoding can
α=1
outperform binary and one-hot encoding and requires
The variable indicator corroborates if there is a domain smaller energy scales [32]. The reasons for the high per-
wall in the position α: formance of unary encoding are still under investigation,
1 but redundancy is believed to play an important role, be-
δvα =
(sα − sα−1 ) , (26) cause it facilitates the annealer to find the ground state.
2
As for domain-wall encoding [35], the minimum Ham-
where s0 ≡ −1 and sN ≡ 1, and the variable v can be ming distance between two valid states (i.e., the number
written as of spin flips needed to pass from one valid state to an-
N
X other) could also explain the better performance of the
v= αδvα unary encoding.
α=1
(27)
−1
" N
#
1 X
= (1 + N ) − si . F. Block encodings
2 i=1

Quantum annealing experiments using domain wall en- It is also possible to combine different approaches to ob-
coding have shown significant improvements in perfor- tain a balance between sparse and dense encodings [33].
mance compared to one-hot encoding [31]. This is partly Block encodings are based on B blocks, each consisting of
because the required search space is smaller (N − 1 ver- g binary variables. Similar to one-hot encoding, the valid
sus N qubits for a variable of N values), but also be- states for block encodings are those states where only a
cause domain-wall encoding generates a smoother en- single block contains non-zero binary variables. The bi-
ergy landscape: in one-hot encoding, the minimum Ham- nary variables in block b, {xb,i }g−1
i=0 define a block value
ming distance between two valid states is two, whereas in wb , using a dense encoding like binary, gray or unary.
domain-wall, this distance is one. This implies that ev- For example, if wb is encoded using binary, we have
ery valid quantum state in one-hot is a local minimum, g−1
surrounded by energy barriers generated by the core en-
X
wb = 2i xb,i + 1. (30)
ergy of Eq. (21). As a consequence, the dynamics in i=0
domain-wall encoded problems freeze later in the anneal-
ing process, because only one spin-flip is required to pass The discrete variable v is defined by the active block b
from one state to the other. and its corresponding block value wb ,
g
B−1
X 2X −1
α
E. Unary encoding v= v (b, wb = α) δw b
, (31)
b=0 α=1

For this case, the value of v is encoded in the number of where v (b, wb = α) is the discrete value associated to the
binary variables which are equal to one: quantum state with active block b and block value α, and
α
N
X −1 δw b
is a value indicator that only needs to check the value
v= xi , (28) of binary variables in block b. For each block, there are
i=0 2g−1 possible values (assuming gray or binary encoding
7

for the block), because the all-zero state is not allowed can be greatly improved outside QUBO [29, 38, 39],
(otherwise block b is not active). If the block value wb is prompting us to look for alternative formulations to the
encoded using unary, then g values are possible. The ex- current QUBO-based Hamiltonians. The Parity Archi-
α
pression of δw b
depends on the encoding and is the same tecture is a paradigm for solving quantum optimization
as the one already presented in the respective encoding problems [21, 24] that does not rely on the QUBO formu-
section. lation, hence allowing a wide number of options for for-
The value indicator for the variable v is the correspond- mulating Hamiltonians. The architecture is based on the
ing block value indicator. Suppose the discrete value v0 Parity transformation, which remaps Hamiltonians into a
is encoded in the block b with a block variable wb = α: two-dimensional grid that requires only local connectiv-
ity of the qubits. The absence of long-range interactions
v0 = v (b, wb = α) . (32) enables high parallelizability of quantum algorithms and
eliminates the need for costly and time-consuming SWAP
Then the value indicator δvv0 is gates, which helps to overcome two of the main obsta-
δvv0 = δw
α
. (33) cles of quantum computing: the limited coherence time
b
of qubits and the poor connectivity of qubits within a
A core term is necessary so that only qubits in P
a single quantum register.
block can be in the excited state. Defining tb = i xi,b , The Parity transformation creates a single Parity qubit
the core terms results in for each interaction term in the (original) logical Hamil-
X tonian:
Hcore = tb tb0 , (34)
b6=b0 Ji,j,... σz(i) σz(j) · · · → Ji,j,... σz(i,j,... ) , (36)

or, as a sum constraint, where the interaction strength Ji,j,... is now the local field
(i,j,... )
X of the Parity qubit σz . This facilitates addressing
ccore = tb tb0 = 0. (35) high-order interactions and frees the problem formula-
b6=b0
tion from the QUBO approach. The equivalence between
The minimum value of Hcore is zero. If two blocks b and the original logical problem and the Parity-transformed
b0 have binary variables with values one, then tb tb0 6= 0 problem is ensured by adding three-body and four-body
and the corresponding eigenstate of Hcore is no longer the constraints and placing them on a two dimensional grid
ground state. such that only neighboring qubits are involved in the con-
straints. The mapping of a logical problem into the reg-
ular grid of a Parity chip can for example be realized
IV. PARITY ARCHITECTURE by the Parity compiler [21]. Although the Parity com-
pilation of the problem may require a larger number of
physical qubits, the locality of interactions on the grid
The strong hardware limitations of noisy intermediate-
allows for higher parallelizability of quantum algorithms.
scale Quantum (NISQ) [36] devices have made sparse
This allows constant depth algorithms [22, 40] to be im-
encodings (and especially one-hot encoding) the stan-
plemented with a smaller number of gates [39].
dard approach to problem formulation. This is mainly
because the basic building blocks (value and value indi- The following toy example, summarized in Fig. 2,
cator) are of linear or quadratic order in the spin vari- shows how a Parity-transformed Hamiltonian can be
ables in these encodings. The low connectivity of qubit solved using a smaller number of qubits when the origi-
platforms requires Hamiltonians in the QUBO formu- nal Hamiltonian has high-order interactions. Given the
lation, and high-order interactions are expensive when logical Hamiltonian
translated to QUBO [8]. However, that different choices
of encodings can significantly improve the performance H =σz(1) σz(2) + σz(2) σz(4)
(37)
of quantum algorithms [30–33], since a smart choice of + σz(1) σz(5) + σz(1) σz(2) σz(3) + σz(3) σz(4) σz(5) ,
encodings can reduce the search space or generate a
smoother energy landscape. the corresponding QUBO formulation requires 5 + 2 = 7
One way this difference between encodings manifests qubits, including two ancilla qubits for decomposing the
itself is in the number of spin flips of physical qubits three-body interactions, and the total number of two-
needed to change a variable into another valid value [35]. body interactions is 14. The embedding of the QUBO
If this number is larger than one, there are local minima problem in the quantum hardware may require additional
separated by invalid states penalized with a high cost qubits and interactions depending on the chosen architec-
which can impede the performance of the optimization. ture. Instead, the Parity-transformed Hamiltonian only
On the other hand, such an energy-landscape might of- consists of six Parity qubits with local fields and two
fer some protection against errors (see also Ref. [23] and four-body interactions between close neighbors.
Ref. [37]). Furthermore, other fundamental aspects of It is not yet clear what the best Hamiltonian repre-
the algorithms, such as circuit depth and energy scales sentation is for an optimization problem. The answer
8

Encoding Visualization example Variable value Value indicator Core term

D−1
log2(N)−1
∏ [ i ( α,i
δvα = x 2x − 1) No core term (dense
v= 2i xi + 1

Binary x0 x1 x2 i=0 encoding)1
v=2 i=0 −xα,i + 1]

N−1 2
N−1

(∑ )
v= δvα = xα xd − 1

One-hot x0 x1 x2 ixi
v=1 i=0 d=0

v−i
N−1
v=2 v= δvα =
No core term (dense

xi
Unary
∏α−i encoding)
i=0 i≠α

s0 = − 1 s4 = 1 2
1 + N N−1 si 1 N−1

(∑ )
v= − δvα = (sα − sα−1) sαsα+1 − N + 2
Domain-wall
s1 s2 s3 2 ∑2 2 α=0
i=1
v=3
b1 b2 b3 2
B−1 2g−1
δvv0 = δwαb
Block
v= v b, wb = α) δwαb
∑∑ ( ∑∑
xd,bxe,b′

encoding b≠b′ d,e
b=0 α=1
v=6

1: Penalization term may be necessary if log2 N ∉ ℕ


FIG. 1. Summary of popular encodings of discrete variables in terms of binary xi or Ising si variables. Each encoding has
a particular representation of the value of the variable v and the value indicator δvα . If every quantum state in the register
represents a valid value of v, the encoding is called dense. On the contrary, sparse encodings must include a core term in the
Hamiltonian in order to represent a value of v. Sparse encodings have simpler representations of the value indicators than dense
encodings, but the core term implementation demands extra interaction terms between the qubits. The cost of a Hamiltonian
in terms of number of qubits and interaction terms strongly depends on the chosen encoding, and should be evaluated for every
specific problem.

will probably depend strongly on the particular use case nomial constraints of the form
we want to solve, and will take into account not only X
the number of qubits needed, but also the smoothness of c(v1 , . . . , vN ) = gi vi
the energy landscape, which has a direct impact on the X X
+ gi,j vi vj + gi,j,k vi vj vk + . . . ,
performance of quantum algorithms [41]. The Parity ar-
chitecture allows us to explore formulations beyond the (38)
standard QUBO and this library aims to facilitate the which remain polynomial after replacing the discrete
exploration of new Hamiltonian formulations. variables vi with any encoding. The coefficients
gi , gi,j , . . . depend on the problem and its constraints.
In general, constraints can be implemented dynami-
cally [9, 10] (exploring only quantum states that satisfy
the constraints) or as extra terms Hcore in the Hamilto-
nian, such that eigenvectors of H are also eigenvectors
V. ENCODING CONSTRAINTS of Hcore and the ground states of Hcore correspond to
elements in the domain of f that satisfy the constraint.
Even if the original problem is unconstrained, the use of
Up to this point, we have presented the possible encod- sparse encodings such as one-hot or domain wall imposes
ings of the discrete variables in terms of binary variables. hard constraints on the quantum variables.
In this section, we assume that the encodings of the vari- Constraints arising from sparse encodings are usually
ables have already been chosen and explain how to imple- incorporated as a penalty term Hcore into the Hamilto-
ment the hard constraints associated with the problem. nian that penalizes any state outside the desired sub-
Hard constraints c(v1 , . . . , vN ) = K often appear in opti- space:
mization problems, limiting the search space and making
2
problems even more difficult to solve. We consider poly- Hcore = A [c(v1 , . . . , vN ) − K] , (39)
9

Logical problem (a)

H= + + + +
Hard constraint
σz(1)σz(2) σz(2)σz(4) σz(1)σz(5) σz(1)σz(2)σz(3) σz(3)σz(4)σz(5)
c(σz(1), …, σz(N)) = giσz(i) + gi,jσz(i)σz( j) + gi,j,kσz(i)σz( j)σz(k) + … = K
∑ ∑ ∑
QUBO Parity compiler
1 2 3 4 5

Energy penalization Driver Driver term


[Hdrive, c(σz , …, σz )] = 0
2 (i) (N)
HC = [c(σz(1), …, σz(N)) − K]
term:

2 4 Initial
345 45 15
condition:
a 3 b
1 5 123 12 24

Valid subspace
Hilbert space ℋ

FIG. 2. Example toy problem H involving high-order inter-


actions that shows how the Parity architecture straightfor- (b)
wardly handles Hamiltonians beyond the QUBO formulation. 46 56 16
3 4
The problem is represented by the hypergraph at the top of
the figure. When decomposed into QUBO form, it requires
1 6 45 25 12
7 qubits and 14 two-body interactions, plus additional qubit
overhead depending on the embedding. In contrast, the Par- 2 5
34 23 13
ity compiler can remap this problem into a two-dimensional
grid that requires six qubits with local fields and four-body Hard constraint:
interactions, represented by the blue squares. No additional σz(1)σz(3) + σz(2)σz(3) + σz(3)σz(4) = c σ+(i)σ−( j) + h . c
embedding is necessary if the hardware is designed for the

Parity Architecture
FIG. 3. (a) Decision tree to encode the constraints, which can
be implemented as energy penalizations in the Hamiltonian
or of the problem or dynamically by selecting a driver Hamil-
tonian that preserves the desired condition. When using en-
Hcore = A [c(v1 , . . . , vN ) − K] , (40) ergy penalties (left figure), the driver term needs to explore
the entire Hilbert space. In contrast, the dynamic imple-
in the special case that c(x1 , . . . , xN ) ≥ K is satisfied. mentation of the constraints (right figure) reduces the search
The constant A must be large enough to ensure that the space to the subspace satisfying the hard constraint, thus im-
ground state of the total Hamiltonian satisfies the con- proving the performance of the algorithms. (b) Constrained
straint, but the implementation of large energy scales logical problem (left) and its Parity representation. In the
Parity architecture, each term of the polynomial constraint
lowers the efficiency of quantum algorithms [42] and ad-
is represented by a single Parity qubit (green qubits), so the
ditionally imposes a technical challenge. Moreover, ex- polynomial constraints define a subspace in which the total
tra terms in the Hamiltonian imply additional overhead magnetization of the involved qubits is preserved and which
of computational resources, especially for squared terms can be explored with a exchange driver σ+ σ− + h.c. that pre-
like in Eq. (39). serves the total magnetization. Figure originally published in
Quantum algorithms for finding the ground state of Ref. [29].
Hamiltonians, like QAOA or quantum annealing, require
driver terms Udrive = exp(−itHdrive ) that spread the ini-
tial quantum state to the entire Hilbert space. Dynami- In general, the construction of constraint-preserving
cal implementation of constraints employs a driver term drivers depends on the problem [9, 10, 25, 28, 30]. Ap-
that only explores the subspace of the Hilbert space that proximate driver terms have been proposed, which ad-
satisfies the constraints. Given an encoded constraint in mit some degree of leakage and may be easier to con-
(i) struct [28]. Within the Parity Architecture, each term of
terms of Ising operators σz :,
X a polynomial constraint is a single Parity qubit [21, 24].
c(σz(1) , . . . , σz(N ) ) = gi σz(i) This implies that for the Parity architecture the polyno-
X (41) mial constraints are simply the conservation of magneti-
+ gi,j σz(i) σz(j) + · · · = K, zation between the qubits involved:
X X
a driver Hamiltonian Hdrive that commutes with gi σz(i) + gi,j σz(i) σz(j) + · · · = K
(1) (N )
c(σz , . . . , σz ) will restrict to the valid subspace pro- i i,j
(43)
vided the initial quantum state satisfies the constraint:
X
→ gu σz(u) = K,
u∈c
c|ψ0 i = K|ψ0 i
(u)
Udrive c|ψ0 i = Udrive K|ψ0 i (42) where the Parity qubit σz represents the logical qubits
(u ) (u ) (u )
c [Udrive |ψ0 i] = K [Udrive |ψ0 i] . product σz 1 σz 2 . . . σz n and we sum over all these
10

products that appear in the constraint c. A driver Hamil- constraints. This output can already be compiled using
(u) (w) the Parity compiler [21] without the need for reducing
tonian based on exchange (or flip-flop) terms σ+ σ− +
h.c., summed over all pairs of products u, w ∈ c, pre- the problem to QUBO or additional embedding.
serves the total magnetization and explores the complete Let us illustrate the encoding for two examples, the
subspace where the constraint is satisfied [29]. The deci- clustering problem and the traveling salesman problem.
sion tree for encoding constraints is presented in Fig. 3. a. Clustering problem: The clustering problem is a
partitioning problem described in Sec. VII A 1. The pro-
cess to go from the encoding-independent Hamiltonian
VI. ENCODING-INDEPENDENT to the spin Hamiltonian that has to be implemented in
FORMULATION the quantum computer is outlined in Fig. 4. A prob-
lem instance is defined from the required inputs: the
The problems given in this library consist of a cost func- number of clusters K we want to create, N objects with
tion f and a set of constraints that define a subspace weights wi and distances di,j between the objects. Two
where the function f must be minimized. Both the cost different types of discrete variables are required, variables
function and the constraints are expressed in terms of dis- vi = 1, . . . , K (i = 1, . . . , N ) indicate to which of the K
crete variables and we call this encoding independent for- possible clusters the object i is assigned, and variables
mulation. To generate a Hamiltonian suitable for quan- yj = 1, . . . , Wmax track the weight in cluster j.
tum optimization algorithms, the discrete variables must The variables vi and yj serve different purposes in the
be encoded in Ising operators, using the encodings pre- formulation of the clustering problem and thus it might
sented in Sec. III. If sparse encodings are chosen for some be beneficial to use different encodings. For variables vi
variables, then the core constraints of these variables only the value indicator δvαi is required, therefore sparse
must be included in the constraint set of the problem. encodings with simple δvαi are recommended. In contrast,
After encoding the problem, it must be decided only the value of variables yj is required for the problem
whether the constraints (either problem constraints or formulation, and so dense encodings like binary or Gray
core constraints) are implemented as energy penalties or will reduce the number of necessary boolean variables. In
dynamically in the driver term (see Sec. V). The Hamil- order to find the spin representation of the problem, it
tonian of the problem will consist of the encoded cost is necessary to choose an encoding and decide if the core
function f plus the constraints ci that are implemented term (if there is one) is included as an energy penalization
as energy penalties: or via a constraint-preserving driver (see Sec. V).
Once the encodings have been chosen, the discrete vari-
H = Af + Bc1 + Cc2 + . . . , (44)
ables are replaced in the cost function and in the con-
where the energy scales A < B < C, . . . are selected such straint associated to the problem, Eqs. (49) and (47),
that the ground state of H is also the ground state of respectively. The problem constraint can be encoded
each of the terms representing constraints. This guaran- as an energy penalization in the Hamiltonian or imple-
tees that it is never energetically favorable to violate a mented dynamically in the driver term of the quantum
constraint, in order to further reduce the cost function. algorithm as a sum constraint. The resulting Hamilto-
If the energy scales are too different, the performance nian and constraints are now in terms of spin operators,
of the algorithms may deteriorate (in addition to the and can compiled using the Parity compiler to obtain a
experimental challenge that this entails) so parameters two-dimensional chip design including only local interac-
A, B, C, . . . cannot be set arbitrarily large. The deter- tions.
mination of the optimal energy scales is an important b. Traveling salesperson problem: The second exam-
open problem. For some of the problems in the library ple we present is the Traveling salesperson (TSP) prob-
we provide an estimation of the energy scales (cf. also lem, described in Sec. VII C 2. The input to this problem
Sec. VIII D 2). is a (usually complete) graph G with a cost wi,j associ-
The problems included in the library are classified into ated with each edge in the graph. We seek a path that
four different categories: subsets, partitions, permuta- visits all nodes of G without visiting Pthe same node twice,
tions, continuous variables. These categories are defined and also minimizes the total cost wi,j of the edges of
by the role of the discrete variable and are intended to the path.
organize the library and make it easier to find problems, This problem requires N variables vi = 1, . . . , N ,
but also to serve as a basis for the formulation of similar where N is the number of vertices in G. Each variable vi
use cases. An additional category in Sec. VII E contains represents a node in the graph, and indicates the stage
problems that do not fit into the previous categories but of the travel at which node i is visited. If vi = α and
may also be important use cases for quantum algorithms. vj = α + 1, the salesperson travels from node i to node
In Sec. VIII we include a summary of recurrent encoding- j and so the cost function adds wi,j to the total cost of
independent building blocks that are used throughout the the travel, as indicated in Eq. (96). This formulation re-
library and could be useful in formulating new problems. quires the constraints of Eq. (93) vi 6= vj , for all i 6= j so
We now present two examples, going from the encoding the salesperson visits every node once.
independent problem to the Ising Hamiltonian and its In Fig. 5, we present an example of a decision tree for
11

De ne problem instance
Clustering problem instance
Distances {di,j}
N objects with weights wi Variables vi Variables yk
Number of clusters K
Choose encoding Choose encoding

Domain wall Binary Unary Encode


One-hot log Wmax Wmax variables
K
yi = 2j xyi,j yi =
∑ ∑
K−1 xyi,j
vi = vi = 1
(1 + N) −

jxvi,j

2
svi,j j=1 j=1
j=1 j=1
yi − l yi − l
Wmax Wmax
δvji = xvi,j = 1
(s − svi,j−1) δyji = δyji =
δvji 2 vi,j ∏ j−l ∏ j−l
l≠j l≠j

Encode the core term


Penalization
Driver term
2
K K No core term
cvi = xvi,j − 1 xvi,j = 1
∑ ∑ required
j=1 j=1

Replace
Cost function Problem constraint
K K N Wmax N encoded
f= di,jδvkiδvkj → f = yk = wiδvki → xyk,j =
∑∑ ∑∑
di,j xvi,k xvj,k variables
∑ ∑ ∑
wi xvi,k
k=1 i<j k=1 i<j i=1 j=1 i=1

Choose implementation
Penalization Driver term
2
Wmax N Wmax N
xyk,j − xyk,j =
∑ ∑
wi xvi,k
∑ ∑
wi xvi,k
j=1 i=1 j=1 i=1

Hamiltonian
Constraints
2 Final Hamiltonian
K Wmax N K K
H = A1 xyk,j − + A2 xvi,j = 1, i = 1,…, N and sum constraints
∑ ∑ ∑
wi xvi,k
∑∑
di,j xvi,k xvj,k ∑
k=1 j=1 i=1 k=1 i<j j=1

fi

FIG. 4. Example decision tree for the clustering problem. The entire process is presented in four different blocks (Define
problem, Encode variables, Replace variables and Final Hamiltonian) and the decisions taken are highlighted in green. The
formulation of a problem instance requires the discrete variables vi and yk to be encoded in terms of qubit operators. In this
example, the vi variables are encoded using the one-hot encoding while for the yk variables we use the binary encoding. The
Hamiltonian is obtained by substituting in the encoding-independent expressions of Eqs. (47) and (49) the discrete variables
vi , yk and the value indicators δvα according to the chosen encoding. Core terms associated to sparse encodings (one-hot in this
case) and the problem constraints can be added to the Hamiltonian as energy penalizations or implemented dynamically in
the driver term of the quantum algorithm as sum constraints. The final Hamiltonian and the sum constraints can be directly
encoded on a Parity chip using the Parity Compiler, without the need to reduce the problem to its QUBO formulation.

this problem. The cost function and the constraint in- VII. PROBLEM LIBRARY
clude products of value indicators δvαi δvβk for which sparse
encodings like domain wall are better suited. A. Partitioning problems

The common goal in partitioning problems is to look


for partitions of a set U minimizing a cost function f . A
12

TSP problem instance De ne problem instance


Graph G with N nodes. Variables vi
Cost wi,j for each edge in G
Choose encoding

Encode variables
Domain wall Unary Binary
One-hot
N log N
vi =
N−1
vi = 2j xvi,j
[ ∑ vi,α]
N

1 xvi,j
vi = vi = (1 + N) − s ∑

αxvi,α 2 j=1 j=1

vi − l j vi − l
α=1 α=1 N N
δvαi = xvi,α δvαi = 1
(s − svi,α−1) δvji =
∏ j−l
δvi =
∏ j−l
2 vi,α l≠j l≠j
Encode the core term

Penalization
Driver term
2
N−2 N−2
cvi = −svi,1 + svi,jsvi,j+1 + svi,N−1 − (1 − N) −svi,1 + svi,jsvi,j+1 + svi,N−1 = 1 − N
∑ ∑
j=1 j=1

Replace encoded variables


Cost function
Problem constraint
f= wi,j (δxαi δxα+1 + δxαj δxα+1)
∑∑
N N

∑∑(
δvαi δvαk = 0 → svi,α − svi,α−1) (svk,α − svk,α−1) = 0
j i
α i≠j ∑∑
↓ α=1 i≠k α=1 i≠k

∑ ∑ 4 [( vi,α vi,α−1) ( vj,α+1 vj,α)


N wi,k
f= s −s s −s Choose implementation
α=1 i≠k
Penalization Driver term

]
+ (svj,α − svj,α−1) (svi,α+1 − svi,α)

Final Hamiltonian
Hamiltonian Constraints and sum constraints
N N

∑∑(
H = A1 cvi + A2 f svi,α − svi,α−1) (svk,α − svk,α−1) = 0

i=1 α=1 i≠k

fi

FIG. 5. Example decision tree for the TSP problem. The Hamiltonian is obtained by substituting in the encoding-independent
expressions of Eqs. (96) and (93) the value indicators according to the chosen encoding. The choices made in the decision tree
are highlighted in green. For this problem instance we assume the graph G is complete (there is an edge connecting every pair
of nodes) and we choose domain-wall encoding. This sparse encoding is convenient since the cost function and the problem
constraint include products of value indicators δvαi δvβk , which would imply a large number of interaction terms using a dense
encoding. Sparse encodings require a core term for defining the valid states of the Hilbert space. In this case, we choose
to encode this core term as an energy penalization in the Hamiltonian while the problem constraint is considered as a sum
constraint. The final Hamiltonian and the sum constraints can be directly encoded on a Parity chip using the Parity Compiler,
without the need to reduce the problem to its QUBO formulation.

partition P of U is a set P := {Uk }k∈K of subsets Uk ⊂ U , arbitrary, therefore the value of the variable vi is usually
SK not important in these cases, but only the value indicator
such that U = k=1 Uk and Uk ∩ Uk0 = 0 if k 6= k 0 .
Partitioning problems require a discrete variable vi for δvαi is needed, so sparse encodings may be convenient for
each element ui ∈ U . The value of the variable indicates these variables.
to which subset the element belongs, so vi can take K
different values. The values assigned to the subsets are
13

1. Clustering problem c. Cost function The partial sums can be repre-


sented using the value indicators associated to the vari-
N
a. Description Let U = {ui }i=1 be a set of N ele- ables vj :
ments, characterized by weights wi ∈ {1, ..., wmax } and X N
X
distances di,j between them. The clustering problem pj = ui = ui δvj i . (50)
looks for a partition of the set U into K non-empty sub- ui ∈Uj i=1
sets Uk that minimizes the distance between vertices in
each subset. Partitions are subject to a weight restric- There are three common approaches for finding the op-
tion: for every subset Uk , the sum of the weights of the timal partition: maximizing the minimum partial sum,
vertices in the subset must not exceed a given maximum minimizing the largest partial sum or minimizing the dif-
weight Wmax . ference between the maximum and the minimum parti-
b. Variables We define a variable vi = 1, . . . , K for tions. The latter option can be formulated as
each element in U . We also require an auxiliary variable
X 2
f= (pi − pj ) . (51)
yj = 0, 1, . . . , Wmax per subset Uk , that indicates the i<j
total weight of the elements in Uk :
In order to minimize the maximum partial sum (max-
imizing the minimum partial sum is done analogously)
X
yk = wi . (45)
{i:ui ∈Uk }
we introduce
P an auxiliary variable l that can take values
1, ..., i ui ≡ lmax . Depending on the problem instance
the range of l can be restricted further. The first term in
c. Constraints The weight restriction is an inequal-
the cost function
ity constraint: !
Y
yk ≤ Wmax , (46) f = lmax 1 − Θ(l − pi ) + l (52)
i

which can be expressed as: then enforces that l is as least as large as the maximum
pi (see Sec. VIII A 3) and the second term minimizes l.
The theta step function can be expressed in terms of the
X
yk = wi δvki . (47)
i
value indicator functions according to the building block
Eq. (150) where we either have to introduce auxiliary
If the encoding for the auxiliary variables makes it nec- variables or express the value indicators directly accord-
essary (e.g. a binary encoding and Wmax not a power ing to the discussion in Sec. VIII D 1.
of 2), this constraint must be shifted as is described in d. Special cases For K = 2, the cost function that
Sec. VIII B 3. minimizes the difference between the partial sums is
d. Cost function The sum of the distances of the f = (p2 − p1 )
2
elements of a subset is: "N #2
X (53)
1 2

= δvi − δvi ui .
X X
fk = di,j = di,j δvki δvkj , (48)
i<j i=1
{i<j:ui ,uj ∈Uk }
The only two possible outcomes for δv1i − δv2i are ±1, so
and the cost function results: these factors can be trivially encoded using spin variables
X si = ±1, leading to
f= fk . (49) N
k
X
f= si ui . (54)
i=1
e. References This problem is studied in Ref. [43] as
part of the Capacitated Vehicle Routing problem. e. References The Hamiltonian formulation for
K = 2 can be found in [11].

2. Number partitioning 3. Graph coloring

a. Description Given a set U of N real numbers ui , a. Description In this problem, the nodes of a graph
we look for a partition of size K such that the sum of the G = (V, E) are divided into K different subsets, each one
numbers in each subset is as homogeneous as possible. representing a different color. We look for a partition
b. Variables The problem requires N variables in which two adjacent nodes are painted with different
vi ∈ [1, K], one per element ui . The value of vi indicates colors, and we can also look for a partition that minimizes
to which subset ui belongs. the number of color used.
14

b. Variables We define a variable vi = 1 . . . K for which is equal to 1 when nodes i and j belong to the
each node i = 1, ..., N in the graph. same partition (vi = vj ) and zero when not (vi 6= vj ). In
c. Constraints We must penalize solutions for which this way, the term:
two adjacent nodes are painted with the same color. The
cost function of the graph partitioning problem presented (1 − δ(vi − vj )) Ai,j (59)
in Eq. (60) can be used for the constraint of graph color- is equal to 1 if there is a cut edge connecting nodes i and
ing. In that case, [1 − δ(vi − vj )] Ai,j was used to count j, being Ai,j the adjacency matrix of the graph. The
the number of adjacent nodes belonging to different sub- cost function for minimizing the number of cut edges is
sets, now we use δ(vi −vj )Ai,j to indicate if nodes i and j obtained by summing over all the nodes:
are painted with the same color. Therefore the constraint X
is (see building block VIII B 1) f= [1 − δ(vi − vj )] Ai,j , (60)
X i<j
c= δ(vi − vj )Ai,j = 0. (55)
i<j or alternatively −f to maximize the number of cut edges.
d. Constraints A common constraint imposes that
d. Cost function The decision problem (“Is there a partitions have a specific size. The element counting
coloring that uses K colors?”) can be answered by im- building block VIII A 1 is defined as
plementing the constraint as a Hamiltonian H = c (note
|V |
that c ≥ 0). The existence of a solution with zero energy X
implies that a coloring with K colors exists. tα = δvαi . (61)
Alternatively we can look for the coloring that uses i=1
the minimum number of colors. To check if a color If we want the partition α to have L elements, then
α is used, we can use the following term (see building
block VIII B 2): tα = L (62)
N
Y must hold. If we want two partitions α and β to have
1 − δvαi ,

uα = 1 − (56) the same size, then the constraint is
i=1
tα = tβ , (63)
which is one if and only if color α is not used in the
coloring, and 0 if at least one node is painted with color and all the partitions will have the same size imposing
α. The number of colors used in the coloring is the cost X
function of the problem: (ta − tb )2 = 0 (64)
a<b
K
which is only possible if |V |/N is a natural number.
X
f= uα . (57)
α=1
e. References The Hamiltonian for K = 2 can be
found in [11].
This objective function can be very expensive to imple- f. Hypergraph partitioning The problem formula-
ment, since uα includes products of δvαi of order N , so a tion can be extended to hypergraphs. A hyperedge is
good estimation of the minimum number Kmin would be cut when contains vertices from at least two different
useful to avoid using an unnecessarily large K. subsets. Given a hyperedge e of G, the function
e. References The one-hot encoded version of this Y
problem can be found in Ref. [11]. cut(e) = δ(vi − vj ) (65)
i,j∈e

4. Graph partitioning is only equal to 1 if all the vertices included in e belong


to the same partition, and zero in any other case. The
product over the vertices i, j of the edge e only needs
a. Description Graph partitioning divides the
to involve pairs such that every vertex appears at least
nodes of a graph G = (V, E) into K different subsets, so
once. The optimization objective of minimizing the cut
that the number of edges connecting nodes in different
hyperedges is implemented by the sum of penalties
subsets (cut edges) is minimized or maximized.
b. Variables We employ one discrete variable
X
f= (1 − cut(e))
vk = 1, . . . , K per node in the graph, which indicates to e∈E
which partition the node belongs.  
(66)
K
c. Cost function An edge connecting two nodes i X Y X
and j is cut when vi 6= vj . It is convenient to use the = 1 − δia δja  .
e∈E i,j∈e a=1
symbol (see building block VIII A 4):
K
( This objective function penalizes all possible cuts in the
X 1 if vi = vj same way, regardless of the number of vertices cut or the
δ(vi − vj ) = δvαi δvαj = , (58)
α=1
0 if vi 6= vj number of partitions to which an edge belongs.
15

5. Clique cover constraints ci . In general, these problems require a bi-


nary variable xi per element in U which indicates if the
a. Description Given a graph G = (V, E), we are element i is included or not in the subset U0 . Although
looking for the minimum number of colors K for color- binary variables are trivially encoded in single qubits,
ing all vertices such that the subsets Wα of vertices with non-binary auxiliary variables may be necessary to for-
the color α together with the edge set Eα restricted to mulate constraints, so the encoding-independent formu-
edges between vertices in Wα form complete graphs. A lation of these problems is still useful.
subproblem is to decide if there is a clique cover using K
colors.
1. Cliques
b. Variables For each vertex i = 1, . . . , N we define
variables vi = 1, . . . , K indicating the color that vertex is
assigned to. If the number of colors is not given one has a. Description A clique on a given graph G = (V, E)
to start with some initial guess or some minimal value is a subset of vertices W ⊆ V such that W and the
for K. subset EW of edges between vertices in W is a complete
c. Constraints In this problem Gα = (Wα , Eα ) has graph, i.e. the maximal possible number of edges in EW
to be a complete graph, so the maximum number of edges is present. The goal is to find a clique with cardinality
in Eα must be present. Using the element counting build- K. Additionally one could ask what the largest clique of
ing block, we calculate the number of vertices with color the graph is.
α (see bulding block VIII A 1): b. Variables We define |V | binary variables xi that
indicate whether vertex i is in the clique or not.
N
X c. Constraints This problem has two constraints,
tα = δiα . (67) namely that the cardinality of the clique is K and that
i=1 the clique indeed has the maximum number of edges. The
former is enforced by
If Gα is a complete graph, then the number of edges in
|V |
Eα is tα (tα − 1)/2 and thus the constraint reads X
  c1 = xi = K (71)
K i=1
X t
 α α(t − 1) X
c= − δiα δjα  = 0. (68) and the latter by
α=1
2
(i,j)∈E X K(K − 1)
c2 = xi xj = . (72)
Note that we do not have to square the term as it can 2
(i,j)∈E
never be negative.
d. Cost function The decision problem (“Is there a If constraints are implemented as energy penalization,
clique cover using K colors?”) can be answered using one has to make sure that the first constraint is not vio-
the constraint c as the Hamiltonian of the problem. For lated in order to decrease the penalty for the second con-
finding the minimum number of colors Kmin for which straint. Using the cost/gain analysis (see Sec. VIII D 2)
a clique cover exist (the clique cover number ), we add a of a single spin flip this is prevented as long as a1 ' a2 ∆,
cost function for minimizing K. As in the graph coloring where ∆ is the maximal degree of G and a1,2 are the
problem, we minimize the number of colors using energy scales of the first and second constraints.
d. Cost function The decision problem (is there a
X clique of size K?) can be solved using the constraints as
f= (1 − uα ) , (69)
the Hamiltonian of the problem. If we want to find the
α
largest clique of the graph G, we must encode K as a
where discrete variable, K = 1, . . . , Kmax , where Kmax = ∆ is
the maximum degree of G and the largest possible size
N
Y of a clique. The cost function for this case is simply the
1 − δvαi

uα = (70) value of K:
i=1
f = −K (73)
indicates if the color α is used or not (see building
e. Resources Implementing constraints as energy
block VIII B 2).
penalizations, the total cost function for the decision
e. References The one-hot encoded Hamiltonian of
problem (fixed K),
the decision problem can be found in Ref. [11].
 2
2
f = (c1 − K) + c2 − K(K−1)2 , (74)
B. Constrained subset problems has maximum order two of the interaction terms and the
number of terms scales with |E| + |V |2 . If K is encoded
Given a set U , we look for a non-empty subset U0 ⊆ U as a discrete variable, the resources depend on the chosen
that minimizes a cost function f while satisfying a set of encoding.
16

f. References This Hamiltonian was formulated in f. References This problem can be found in
Ref. [11] using one-hot encoding. Refs. [11].

2. Maximal independent set 4. Vertex cover

a. Description Given an hypergraph G = (V, E) we a. Description Given a hypergraph G = (V, E) we


look for a subset of vertices S ⊂ V such that there are want to find the smallest subset C ⊆ V such that all edges
no edges in E connecting any two vertices of S. Finding contain at least one vertex in C.
the largest possible S is an NP-hard problem. b. Variables We define |V | binary variables xi that
b. Variables We use a binary variable xi for each indicate whether vertex i belongs to the cover C.
vertex in V . c. Cost function Minimizing the number of vertices
c. Cost function For maximizing the number of ver- in C is achieved with the element counting building block
tices in S, the cost function is
|V |
X
|V | f= xi . (79)
X
f =− xi . (75) i=1
i=1
d. Constraints With
d. Constraints Given two elements in S, there must k
not be any edge of hyperedge in E connecting them. The X Y
c= (1 − xua ) = 0 (80)
constraint
X e=(u1 ,...,uk )∈E a=1
c= Ai,j xi xj (76)
one can penalize all edges that do not contain vertices
i,j∈V
belonging to C. Encoding the constraint as an energy
counts the number of adjacent vertices in S, being A the penalization, the Hamiltonian results in
adjacency matrix. Setting c = 0, the vertices in S form
an independent set. H = Af + Bc. (81)
e. References This problem can be found in By setting B > A we avoid that the constraint is traded
Refs. [11, 44] for graphs. off against the minimization of C.
e. Resources The maximum order of the interaction
terms is the maximum rank of the hyperedges k and the
3. Set packing
number of terms scales with |E|2k + |V |.
f. References The special case that only considers
a. Description Given a set U and a family S = graphs can be found in Ref. [11].
{Vi }N
i=1 of subsets Vi of U , we want to find set pack-
ings, i.e., subsets of S such that all subsets are pairwise
disjoint, Vi ∩ Vj = ∅. Finding the maximum packing 5. Minimal maximal matching
(the maximum number of subsets Vi ) is the NP-hard op-
timization problem called set packing. a. Description Given a hypergraph G = (V, E) with
b. Variables We define N binary variables xi that edges of maximal rank k we want to find a minimal (i.e.
indicate whether subset Vi belongs to the packing. fewest edges) matching C ⊆ E which is maximal in the
c. Cost function Maximizing the number of subsets sense that all edges with vertices which are not incident
in the packing is achieved with the element counting to edges in C have to be included to the matching.
building block b. Variables We define |E| binary variables xi that
N
X indicate whether an edge belongs to the matching C.
f =− xi . (77) c. Cost function Minimizing the number of edges in
i=1 the matching is simply done by the cost function
d. Constraints In order to ensure that any two sub- |E|
X
sets of the packings are disjoint we impose a cost on over- fA = A xi . (82)
lapping sets with i=1
X
c= xi xj = 0. (78) d. Constraints We have to enforce that C is indeed
i,j:Vi ∩Vj 6=∅ a matching, i.e. that no two edges which share a vertex
belong to C. Using an energy penalty this is achieved
e. Resources The total cost function H = HA + HB by:
has maximum order two of the interaction terms (so it is X X
a QUBO problem) and the number of terms scales with fB = B xi xj = 0, (83)
up to N 2 . v∈V (i,j)∈∂v
17

where ∂v is the set of edges connected to vertex v. Ad- e. Special case: exact cover If we want each element
ditionally, the matching should be maximal.
P For each of U to appear once and only once on the cover, then yα =
vertex u, we define a variable yu = i∈∂u xi which is 1, for all α and the constraint of the problem reduces to
only zero if the vertex does not belong to an edge of C.
If the first constraint is satisfied, this variable can only
X
cα = xi = 1. (89)
be 0 or 1. In this case, the constraint can be enforced by i:α∈Vi

X k
Y
fC = C (1 − yua ) = 0. (84) f. References The one-hot encoded Hamiltonian can
e=(u1 ,...,uk )∈E a=1 be found in Ref. [11].
However, one has to make sure that the constraint imple-
mented by fB is not violated in favor of fC which could
happen if for some v, yv > 1 and for m neighboring ver- 7. Knapsack
tices yu = 0. Then the contributions from v are given
by a. Description A set U contains N objects, each of
1 them with a value di and a weight w i . We look for a
fv = Byv (yv − 1) + C(1 − yv )m (85) subset of U with the maximum value
P
di such that the
2
total weight of the selected objects does not exceed the
and since m+yv is bounded by the maximum degree ∆ of
upper limit W .
G times the maximum rank of the hyperedges k, we need
to set B > (∆k − 2)C to ensure that the ground state b. Variables We define a binary variable xi = 0, 1
of fB + fC does not violate the first constraint. Finally, for each element in U that indicates if the element i is
one has to prevent fC to be violated in favor of fA which selected or not. We also define an auxiliary variable y
entails C > A. that indicates the total weight of the selected objects:
e. Resources The maximum order of the interaction X
terms is k and the number of terms scales roughly with y= wi . (90)
(|V | + |E|2k )∆(∆ − 1). i:xi =1
f. References This problem can be found in
Ref. [11]. If the weights are natural numbers wi ∈ N then y is also
natural, and the encoding of this auxiliary variable is
greatly simplified.
6. Set cover c. Cost function The cost function is given by

a. Description Given a set U = {uα }nα=1 and N N


X
subsets Vi ⊆ U S, we look for the minimum number of Vi f =− di xi , (91)
such that U = Vi . i=1
b. Variables We define a binary variable xi = 0, 1
for each subset Vi that indicates if the subset Vi is selected which counts the value of the selected elements.
or not. We also define auxiliary variables yα = 1, 2, . . . , N d. Constraints The constraint y < W is imple-
that indicate how many active subsets (Vi such that mented by forcing y to take one of the possible values
xi = 1) contain the element uα . y = 1, . . . W − 1 (see Ref. VIII B 3). The value of y must
c. Cost function The cost function is simply be consistent with the selected items from U :
N
X N
f= xi , (86)
X
c=y− wi xi = 0. (92)
i=1
i
which counts the number of selected subsets
S Vi .
d. Constraints The constraint U = i:xi =1 Vi can e. References The one-hot encoded Hamiltonian
be expressed as can be found in Ref. [11].
yα > 0, ∀α = 1, . . . , n, (87)
which implies that every element uα ∈ U is included C. Permutation problems
at least once. These inequalities are satisfied if yα are
restricted to the valid values (yα = 1, 2, . . . , N , yα 6= 0)
(see Sec. VIII B 3). The values of yα should be consistent In permutation problems, we need to find a permuta-
with those of xi , so the constraint is tion of N elements that minimizes a cost function while
X satisfying a given set of constraints. In general, we will
cα = yα − xi = 0. (88) use a discrete variable vi ∈ [1, N ] that indicates the po-
i:uα∈Vi sition of the element i in the permutation.
18

1. Hamiltonian cycles otherwise that expression would be zero. Therefore the


total cost of the travel is codified in the function:
a. Description For a graph G = (V, E), we ask if a XX  
Hamiltonian cycle, i.e., a closed path that connects all f= wi,j δvαi δvα+1
j
+ δvαj δvα+1
i
. (96)
α i<j
nodes in the graph through the existing edges without
visiting the same node twice, exists.
d. Constraints The constraints are the same as
b. Variables We define a variable vi = 1, . . . , |V | for those used in the Hamiltonian Cycles problem (see para-
each node in the graph, that indicates the position of the graph VII C 1 d). If the graph is complete (all the cities
node in the permutation. are connected) then constraint c2 is not necessary.
c. Cost function For this problem there is no cost e. References The one-hot encoded Hamiltonian
function, so every permutation that satisfies the con- can be found in Ref. [11].
straints is a solution of the problem.
d. Constraints This problem requires two con-
straints. The first constraint is inherent to all permu- 3. Machine scheduling
tation problems, and imposes to the |V | variables {vi }
to be a permutation of [1, . . . , |V |]. This is equivalent to a. Description Machine scheduling problems seek
require vi 6= vj if i 6= j, which can be encoded with the the best way to distribute a number of jobs over a fi-
following constraint (see Sec. VIII B 1): nite number of machines. Basically, these problems ex-
plore permutations of the job list, where the position of
N X
X a job in the permutation indicates on which machine and
c1 = δvαi δvαk = 0. (93)
at what time the job is executed. Many variants of the
α=1 i6=k
problem can be found in the literature, including formu-
lations of the problem in terms of spin Hamiltonians [45–
The second constraint ensures that the path only goes
47]. Here we consider the problem of M machines and N
through the edges of the graph. Let A be the adjacency
jobs, where all jobs take the same amount of time to com-
matrix of the graph, such that Ai,j = 1 if there is an edge
plete, so the time can be divided into time slots of equal
connecting nodes i and j and zero otherwise. To penalize
duration t. It is possible to include jobs of duration nt
invalid solutions, we use the constraint
(n ∈ N) by using appropriate constraints that force some
|V |
jobs to run in consecutive time slots on the same machine.
XX   In this way, problems with jobs of different duration can
c2 = (1 − Ai,j ) δvαi δvα+1 + δvα+1 δvαj = 0, (94)
j i be solved by choosing the time t sufficiently small.
i,j α=1
b. Variables We define variables vm,t = 0, . . . N ,
where the subindex m = 1, . . . M indicates the machine
which counts how many adjacent nodes in the solution
and t = 1, . . . T the time slot. When vm,t = 0, the ma-
are not connected by an edge in the graph. α = |V | + 1
chine m is unoccupied in time slot t, and if vm,t = j 6= 0
represents α = 1 since we are looking for a closed path.
then the job j is done in the machine m, in the time slot
e. References The one-hot encoded Hamiltonian t.
can be found in Ref. [11]. c. Constraints There are many possible constraints
depending on the use case we want to run. As in every
permutation problem, we require that no pair of variables
2. Traveling salesperson problem (TSP) have the same value, vm,t 6= vm0 ,t0 , otherwise some jobs
would be performed twice. We also require each job to
a. Description The TSP is a trivial extension of the be complete so there must be exactly one vm,t = j for
Hamiltonian cycles problem. In this case, the nodes rep- each job j. This constraint is explained in Sec. VIII B 1
resent cities and the edges are the possible roads con- and holds for every job j 6= 0:
necting the cities, although in general it is assumed that
all cities are connected (the graph is complete). For each " #2
edge connecting cities i and j there is a cost wi,j . The
X X
c1 = δvj m,t −1 = 0. (97)
solution of the TSP P is the Hamiltonian cycle that mini- j6=0 m,t
mizes the total cost wi,j .
b. Variables We define a variable vi = 1, . . . , |V | for Note that if some job j is not assigned (i.e. there are no
each node in the graph, indicating the position of the m, t such that vm,t = j) then c1 > 0. Also if there is
node in the permutation. more that one variable vm,t = j, then again c1 > 0. The
c. Cost function If the traveler goes from city i at constraint will be satisfied (c1 = 0) if and only if every
position α to city j in the next step, then job is assigned to a single time slot, on a single machine.
Suppose job k can only be started if another job j
δvαi δvα+1
j
= 1, (95) has been done previously. This constraint can be imple-
19

mented as For details on how Θ(x) can be expressed see


X Sec. VIII A 2. By minimizing fτmax , we ensure that
c2,k>j = δvj m,t δvkm0 ,t0 = 0, (98) τmax ≥ τ (vm,t ). Then the cost function is simply to min-
m,m0 ,t≥t0 imize τmax , the latest time a job is scheduled, via
which precludes any solution in which job j is done after f = τmax . (106)
job k. Alternatively, it can be codified as
X An alternative cost function for this problem is
c02,k>j = δvj m,t δvkm0 ,t0 = 1. (99) X
m,m0 ,t<t0 f0 = τ (vm,t ), (107)
m,t
These constraints can be used to encode problems that
consider jobs with different operations O1 , . . . Oq that which forces all jobs to be schedule as early as possible.
must be performed in sequential order. Note that con- e. References Quantum formulations of this and re-
straints c2 , c02 allow using different machines for different lated problems can be found in Refs. [45–48]
operations. If we want job k to be done immediately after
work j, we can substitute t0 by t + 1 in c02 .
If we want two jobs j1 and j2 to run in the same ma- 4. Nurse Scheduling problem
chine in consecutive time slots, the constraint can be en-
coded as a. Description In this problem we have N nurses
X and D working shifts. Nurses must be scheduled with
c3 = δvj1m,t δvj2m,t+1 = 1, (100)
minimal workload following hard and soft constraints,
m,t
like minimum workload nmin,t of nurses i (where nurse i
or as a reward term in the Hamiltonian, contributes workload pi ) in a given shift t and balancing
X the number of shifts as equal as possible. Furthermore,
c03 = − δvj1m,t δvj2m,t+1 , (101) no nurse should have to work on more than dmax consec-
m,t utive days.
b. Variables We define N D binary variables vi,t in-
that reduces the energy of any solution in which job j2 dicating whether nurse i is scheduled for shift t.
is performed immediately after job j1 in the same ma- c. Cost function The cost function whose minimum
chine m. This constraint allows to encode problems with corresponds to minimal number of overall shifts is given
different job duration, since j1 and j2 can be considered by
part of a same job of duration 2t.
d. Cost function Different objective functions can D X
X N
be chosen for this problem, such as minimizing machine f= vi,t (108)
idle time or early and late deliveries. A common option t=1 i=1
is to minimize the makespan, i.e., the time slot of the last
scheduled job. To do this we first introduce an auxiliary d. Constraints Balancing of the shifts is expressed
function τ (vm,t ) that indicates the time slots in which by the constraint
some job has been scheduled: !2
X X X
( c= vi,t − vj,t0 . (109)
t if vm,t 6= 0 t t0
τ (vm,t ) = (102) i<j
0 if vm,t = 0.
In order to get the minimal workload per shift we in-
These functions can be generated from vm,t : troduce the auxiliary variables yt which we bind to the
  values
τ (vm,t ) = 1 − δv0m,t . (103) N
X
yt = pi vi,t (110)
Note that the maximum value of τ corresponds to the i=1
makespan of the problem, which is to be minimized. To
do this, we introduce the extra variable τmax and penalize with the penalty terms
configurations where τmax < τ (vm,t ) for all m, t: !2
N
X
yt −
Y
fτmax = Θ [τ (vm,t ) − τmax ] , (104) pi vi,t . (111)
m,t i=1

with Now the constraint takes the form


( D
1 if x ≥ 0 c=
X X
δyαt = 0. (112)
Θ(x) = (105)
0 if x < 0. t=1 α<nmin,t
20

Note that we can also combine the cost function for min- for the projection to the i-th coordinate. A hyperrectan-
imizing the number of shifts and this constraint by using gle R ⊂ Rd is defined as
the cost function
R := x ∈ Rd πi (x) ∈ [ai , bi ], ∀ i ∈ Dn .

!2 (116)
D
X XN
f= pi vi,t − nmin,t . (113) For the fixed set of chosen coordinates Dn the Dirich-
t=1 i=1 let processes are run m times to get m hyperrectangles
Finally, the constraint that a nurse i should work in max- {R1 , . . . , Rm }. For k = 1, . . . , m, we define binary pro-
imally dmax consecutive shifts reads jection maps
t2
X Y zk : Rd → Z2
c(i) = vi,t0 = 0. (114) (
t1 −t2 >dmax t0 =t1 1, if x ∈ Rk (117)
x 7→ zk (x) :=
e. References This problem can be found in 0, else.
Ref. [49].
Then Random subspace coding is defined as a map

D. Real variables problems z : Rd → Zm2


(118)
x 7→ z(x) := (z1 (x), . . . , zm (x)) .
Some problems are defined by a set of real variables
{vi ∈ R, i = 1, ..., d}, and a cost function f (vi ) that has Depending on the set of hyperrectangles {R1 , · · · , Rn },
to be minimized. Just as for the discrete problems we random subspace coding can be a sparse encoding. Let
discussed previously, we want to express these problems M := {1, . . . , m} and K ⊂ M . We define sets
as Hamiltonians whose ground state represents an op- \ [
timal solution. This shall be done in three steps; first U (K) := Rk − Ri 6= ∅ . (119)
we will encode the continuous variables to discrete ones. k∈K i∈M −K
Second, we have to choose an encoding of these discrete
variables to spin variables exactly as before and finally a For z ∈ Zm2 , let K(z) = {k ∈ {1, . . . , m} | zk = 1}. The
decoding step is necessary that maps the solution to our binary vector z is in the image of the random subspace
discrete problem back to the continuous domain. In the encoding if and only if
following we investigate two possible methods one could
use for the first step. U (K(z)) 6= ∅ (120)
a. Standard discretization Given a vector v ∈ Rd
holds. From here one can simply use the discrete encod-
one could simply discretize it by partitioning the axis
ings discussed in Sec. III to map the components zi (x)
into intervals with the length of the aspired precision q,
of z(x) to spin variables. Due to the potential sparse en-
i.e., the components of v = (vi ), i = 1, ..., d are mapped
coding, a core term has to be added to the cost function.
to vi → ṽi ∈ N such that (ṽi − 1)q ≤ vi < ṽi q.
Let N = {L ⊂ M | U (L) = ∅}, then the core term reads
Depending on the problem it might also be useful
to have different resolutions for different axis or a non- X Y Y
uniform discretization, e.g., logarithmic scaling of the in- c= δz1` δz0p . (121)
L∈N `∈L p∈M −L
terval length.
For the discrete variables ṽi one can then use the en- Despite this drawback, random subspace coding might
codings in Sec. III. In the final step after an intermedi- be preferable due to its simplicity and high resolution
ate solution ṽ ∗ is found one has to map it back to Rd with relatively few hyperrectangles compared to the hy-
by uniformly sampling the components of v ∗ from the percubes of the standard discretization.
hypercuboids corresponding to ṽ ∗ . Any intermediate so-
lution that is valid for the discrete encoding can also be
decoded and thus there are no core terms in the cost 1. Financial crash problem
function besides those from the discrete encoding.
b. Random subspace coding A further possibility to
encode a vector v ∈ Rd into discrete variables is random a. Description We are going to calculate the fi-
subspace coding [50]. One starts with randomly choosing nancial equilibrium of market values vi , i = 1, ..., n of
a set of coordinates Dn := {1, 2, . . . , dn } ⊂ {1, 2, . . . , d}. n institutions according to a simple model following
For each i ∈ Dn , a Dirichlet process is used to pick an Refs. [52, 53]. In this model, the prices of m assets are
interval [ai , bi ] in the i-coordinate direction in Rd [51]. labeled by pk , k = 1, ..., m. Furthermore we define the
We denote ownership matrix D, where Dij denotes the percentage
of asset j owned by i, the cross-holdings C, where Cij de-
π i : Rd → R notes the percentage of institution j owned by i (except
(115)
x = (x1 , . . . , xd ) 7→ πi (x) := xi for self-holdings) and the self-ownership matrix C̃. The
21

model postulates that without crashes the equity values where k is the highest order of the ansatz and the vari-
V (such that v = C̃V ) in equilibrium satisfy ables vi depend on the continuous to discrete encoding.
In terms of the indicator functions they are expressed as
V = Dp + CV → v = C̃(1 − C)−1 Dp. (122) X
vi = αδvαi . (127)
Crashes are then modeled as abrupt changes in the prices α
of assets held by an institution, i.e., via
Alternatively, one could write the ansatz as a function of
v = C̃(1 − C)−1 (Dp − b(v, p)), (123) the indicator functions alone instead of the variables
k
where bi (v, p) = βi (p)(1 − Θ(vi − vic )) results in the prob- X X α α
lem being highly non-linear. y 0 (v, w) = wi01 ,...,ir δvii11 ...δviirr . (128)
b. Variables It is useful to shift the market values r=1 i1 ,...,ir =0,...,∆

and we end up with a variable v − v c = v 0 ∈ Rn . Since


the crash functions bi (v, p) explicitly depend on the com- While the number of terms is roughly the same as before,
ponents of v 0 it is not convenient to use the random sub- this has the advantage that the energy scales in the cost
space coding (or only for the components individually) function can be much lower. On the downside, one has
and so we employ the standard discretization, i.e. each to consider that usually for an optimization to be better
component of v 0 takes discrete values 0, ..., K such that than random sampling one needs the assumption that the
with desired resolution r a cut-off value rK is reached. function f is well-behaved in some way (e.g. analytic).
c. Cost function In order to enforce that the system The formulation of the ansatz in Eq. 128 might not take
is in financial equilibrium we simply square Eq. (123) advantage of this assumption in the same way as the first.
Let w∗ ≡ {wi∗1 ,...,ir } denote the fitted parameters of
 2 the ansatz. Then the cost function is simply y(v, w∗ )
f = v 0 + v c − C̃(1 − C)−1 (Dp − b(v, p)) , (124) or y 0 (v, w∗ ).
d. Constraints In this problem the only constraints
where for the theta functions in b(v, p) we use that can appear are core terms. For the standard dis-
X cretization no such term is necessary and for the random
Θ(vi0 ) = δvαi0 , (125) subspace coding we add Eq. (121).
α≥0 e. References This problem can be found in
Ref. [54].
as suggested in Sec. VIII A 4.

3. Nurse scheduling with real parameters


2. Continuous black box optimization

In the following we will reexamine the nurse scheduling


a. Description Given a function f : [0, 1]d → R problem (see Sec. VII C 4) where the minimum workload
which can be evaluated at individual points a finite num- nmin,t and the nurse power pi of a nurse i are real-valued
ber of times but where no closed form is available we want parameters. The general structure of the cost function
to find the global minimum. In classical optimization the and the constraint will be the same as in Sec. VII C 4
general strategy is to use machine learning to learn an and when we choose to combine the cost function with
analytical acquisition function g(x) in closed form from the constraint of having a minimal workload per shift as
some sample evaluations, optimize it to generate the next in Eq. (113) we do not have to change anything. How-
point where to evaluate f on and repeat as long as re- ever, there are several options when the constraint is
sources are available. implemented using the auxiliary variable yt defined in
b. Variables The number and range of variables de- Eq. (110) and the conclusion reached by examining them
pends on the continuous to discrete encoding. In the might be useful for other problems. One possibility would
case of the standard discretization we have d variables va be to use Eq. (110) with the real parameters pi and then
taking values in {0, ..., d1/qe}, where q is the precision. yt will take only finitely many values which we can label
With the random subspace encoding we have ∆ vari- in some way. However, the problem is that the number of
ables va taking values in {0, ..., ds }, where ∆ is the max- these values grows exponentially in the number of vari-
imal overlap of the rectangles and ds is the number of ables N . Alternatively, one might discretize the values
rectangles. PN
of yt by splitting ymax ≡ i=1 pi into discrete segments.
c. Cost function Similar to the classical strategy we
Again, this is not optimal as now the constraint that
first fit/learn an acquisition function with an ansatz.
binds the auxiliary variable to its value (e.g., the penalty
Such an ansatz could take the form
term in Eq. (111)) very likely is not exactly fulfilled and
k
X X the remaining penalty introduces a bias in favor of con-
y(v, w) = wi1 ,...,ir vi1 ...vir , (126) figurations of vi,t where this penalty is lower but the cost
r=1 i1 ,...,ir =0,...,∆ function for the optimization might not be minimized.
22

Thus, the natural choice is to simply discretize the pa- so we have


rameters nmin,t , pi and rescale them such that they take
integer values. That is, we replace pi → p̃i ∈ {1, ..., K} fH ≡ c1 f1 + c2 f2 (134)
where we set K according to Kr = max pi with the
i with positive parameters c1 /c2 .
precision r and we have p̃i r ≤ pi < (p̃i + 1)r. c. Variables The variables in the check-based for-
mulation are the n bits ei of the error e. In the generator-
based formulation, k bits ui of the logical word u are de-
E. Other problems
fined. If we use the reformulation Eq. (131), these are
replaced by the k spin variables sl . Note that the state
In this final category we include problems that do not v is assumed to be given by an efficient classical calcula-
fit into the previous classifications but constitute impor- tion.
tant use cases for quantum optimization. d. Constraints There are no hard constraints, any
logical word u and any physical state e are valid.
e. Resources There are a number of interesting
1. Syndrome decoding problem
tradeoffs that can be found by analyzing the resources
needed by both approaches. The check-based approach
a. Description For an [n, k] classical linear code (a features a cost function with up to (n − k) + n terms (for
code where k logical bits are encoded in n physical bits) a non-degenerate check matrix with n − k rows, there are
the parity check matrix H indicates whether a state y (n − k) terms from f1 and n terms from f2 ) whereas in
of physical bits is a code word or has a non-vanishing fG there are at most n terms (the number of rows of G).
error syndrome η = yH T . Given such a syndrome, we The highest order of the variables that appear in these
want to decode it, i.e., find the most likely error with terms is for the generator-based formulation bounded by
that syndrome, which is equivalent to solving the number of rows of G which is k. In general, this or-
argmin wt(e), (129) der can be up to n (number of columns of H) for fH but
e∈{0,1}n ,eH T =η for an important class of linear codes (low density parity
check codes) the order would be bounded by the constant
where wt denotes the Hamming weight and all arithmetic weight of the parity checks. This weight can be quite low
is mod 2. but there is again a tradeoff because higher weights of
b. Cost function There are two distinct ways to for- the checks result in better encoding rates (i.e. for good
mulate the problem: Check-based and generator-based. low density parity check codes codes they increase the
In the generator-based approach we note that the genera- constant encoding rate n/k ∼ α).
tor matrix G of the code satisfies GH T = 0 and thus any f. References This problem was originally presented
logical word u yields a solution to eH T = η via e = uG+v in Ref. [55].
where v is any state such that vH T = 0 which can be
found efficiently. Minimizing the weight of uG + v leads
to the cost function 2. k-SAT
n k
!
X X
1
fG = (1 − 2vj ) 1 − 2 δul Gjl . (130) a. Description A k-SAT problem instance consists
j=1 l=1 of a boolean formula
Note that the summation over l is mod 2 but the rest of f (x1 , ..., xn ) = σ1 (x11 , ..., x1k )... ∧ σm (xm1 , ..., xmk )
the equation is over N. We rewrite this as (135)
n k in the conjunctive normal form (CNF), that is, σi are
G
X Y
fG = (1 − 2vj ) sl jl (131) disjunction clauses over k literals l:
j=1 l=1
σi (xi1 , ..., xik ) = li,1 ∨ · · · ∨ li,k , (136)
according to Sec. VIII C 1.
In the check-based approach we directly minimize de- where a literal li,k is a variable xi,k or its negation ¬xi,k .
viations from eH T = η with the cost function We want to find out if there exists an assignment of the
n k
variables that satisfies the formula.
f1 =
X
(1 − 2ηj )
Y Hjl
sl (132) b. Variables There are two strategies to express a
j=1
k-SAT problem in a Hamiltonian formulation. First, one
l=1
can use a Hamiltonian cost function based on violated
but we additionally have to penalize higher weight errors clauses. In this case the variables are the assignments
with the term x ∈ {0, 1}n . For the second method a graph is con-
n
X structed from a k-SAT instance in CNF as follows. Each
f2 = δe1j (133) clause σj will be a fully connected graph of k variables
j=1 xi1 , ..., xik . Connect two vertices from different clauses
23

if they are negations of each other, i.e. yil = ¬yjp and npm multiplication operations. However, there are algo-
solve the maximum independent set (MIS) problem for rithms using fewer resources, e.g. in Ref. [56]. Strassen
this graph [44]. If and only if this set has cardinality m, showed that there is an algorithm for multiplying two
the SAT instance is satisfiable. The variables in this for- 2 × 2 matrices involving only 7 multiplications at the
mulation are the m×k boolean variables xi which are 1 if expense of more addition operations which are computa-
the vertex is part of the independent set and 0 otherwise. tionally less costly.
c. Cost function A clause-violation based cost func- In a recent paper [57] machine learning methods were
tion for a problem in CNF can be written as used to identify new effective algorithms for matrix mul-
X tiplication. They formulated the search for more effec-
fC = (1 − σj (x)) (137) tive algorithms (using less multiplication operations) as
j an optimization problem. We will show that there is an
equivalent Ising-Hamlitonian optimization problem.
with (see Sec. VIII C 2) Multiplying matrices A and B can be stated in terms
of an nm × mp × np-tensor M. The usual algorithm for
σj = lj1 ∨ lj2 ... ∨ ljk matrix multiplication corresponds to a sum decomposi-
k
Y (138) tion of M into nmp-terms of the following form:
=1− (1 − ljn ).
nmp−1
n=1 X
M= u(α) ⊗ v(α) ⊗ w(α) , (141)
A cost function for the MIS problem can be constructed α=0
from a term encouraging a higher cardinality of the in-
dependent set: where u(α) is an nm-vector with binary entries for any
α = 0, · · · , nmp and v(α) , w(α) are binary mp- and np
m×k
X vectors. How to multiply two matrices using the tensor
fB = −b xj . (139) Eq. (141) is described in Ref. [57, Algorithm 1]. One
j=1 maps matrices A, B and C to vectors of the respective
length, e.g., an n × m-matrix A will be an nm-vector
d. Constraints In the MIS formulation one has to whose k-th entry is the component Ars for k = rm +
enforce that there are no connections between members s. Then one starts by computing an intermediate nmp-
of the maximally independent set: vector m = (m0 , · · · , mnmp−1 ) which is computed by
X  
c= xi xj = 0. (140) nm−1
X (i) mp−1
X (i)
!
(i,j)∈E mi =  uj Aj  vk Bi (142)
j=0 k=0
If this constraint is implemented as an energy penalty
fA = ac, it should have a higher priority. The minimal for all i = 0, . . . , nmp − 1. In a second step, the com-
cost of a spin flip (cf. Sec. VIII D 2) from fA is a(m − 1) ponents {Ci }i=0,...,np−1 of the final matrix C = AB are
and in fB a spin flip could result in a maximal gain of b. determined via
Thus a/b & 1/(m − 1).
nmp−1
e. Resources The cost function fC consists of up to X (r)
O(2k mk) terms with a maximal order of k in the spin Ck = wk mk , (143)
variables. In the alternative approach the order is only r=0

quadratic so it would naturally be in a QUBO formula- for all k= 0, . . . , np − 1. Making matrix multiplication
tion. However, the number of terms in fMIS can be as more efficient corresponds to finding sum decomposition
high as O(m2 k 2 ) and thus scales worse in the number of of M with fewer terms (lower rank), i.e., we want to
clauses. find vectors a(β) , b(β) , c(β) , where β = 0, . . . , B − 1 and
f. References This problem can be found in B < nmp, such that
Ref. [44].
B−1
X
M= a(β) ⊗ b(β) ⊗ c(β) (144)
3. Improvement of Matrix Multiplication β=0

a. Description Matrix multiplication is among the holds. Performing the above algorithm with this shorter
most used mathematical operations employed in infor- tensor decomposition corresponds to a matrix multipli-
matics. For a streamlined presentation of the procedure cation algorithm involving fewer scalar multiplications.
we present the case of matrix multiplication over the real Thus the new algorithm is more efficient. In general, we
numbers. can allow the vectors
Given a real n × m matrix A and a real m × p matrix n o
B the usual algorithm for matrix multiplication requires a(β) , b(β) , c(β)
β=1,··· ,B−1
24

to have real entries. To ease our lives we restrict ourselves A. Auxiliary functions
to entries in the finite set K := {−k, −k + 1, . . . , k − 1, k}
with k ∈ N and Ref. [57] reports promising results for The simplest class of building blocks are auxiliary
k = 2. Note that the tensor M has binary entries though. scalar functions of multiple variables that can be used
Allowing more values for the decomposition vectors just directly in cost functions or constraints via penalties.
increases the search space for possible solutions.
b. Variables The variables of the problem are given
by the entries of the vectors in a sum decomposition of
M. That is, there are vectors of variables 1. Element counting
n o
a(α) One of the most common building blocks is the func-
α=1,...,nmp−1
tion ta that simply counts the number of variables vi with
(α) (α) (α) a given value a. In terms of the value indicator functions
, where every entry ai of a(α) = (a0 , . . . , anm−1 )
we have
is a discrete variable with range n K. Simi-
o
(α)
larly, we have vectors of variables bα=0,...,nmp−1 ,
X
ta = δvai . (148)
(α) (α) (α)
with b(α) = (b0 , . . . , bmp ), 
where bi has range i

K and vectors of variables c(α) α=0,...,nmp−1 with


It might be useful to introduce ta as additional variables.
(α) (α) (α)
c(α) = (c0 , . . . , cnp−1 ) and ci has again range K. In that case one has to bind it to its desired value with
c. Cost function The cost function of the problem the constraints
is given by
!2
  X
nmp−1
X Y c= ta − δvai . (149)
fM := 1 − δa0(α) δb0(α) δc0(α)  . (145) i
i j k
α=0 i,j,k

Note that for fixed α, the term


2. Step function
 
Y
1 − δa0(α) δb0(α) δc0(α)  (146) Step functions Θ(v − w) can be constructed as
i j k
i,j,k
K
(
X 1 if v ≥ w
vanishes if and only if all three vectors a(α) , b(α) and c(α) Θ(v − w) = δvβ δw
α
= , (150)
0 if v < w
vanish. This cost function thus penalizes higher rank in β≥α
the sum decomposition (141) and therefore the ground-
state will represent a more effective multiplication algo- with K being the maximum value that the v, w can take
rithm. on. Step functions can be used to penalize configurations
d. Constraints We have to impose constraints such where v ≥ w.
that the resulting tensor decomposition really coincides
with the matrix multiplication tensor M. Thus there are
constraints 3. Minimizing the maximum element of a set
nmp−1
(α) (α) (α)
X
Mijk = ai bj ck (147) Step functions are particularly useful for minimizing
α=0 the maximum value of a set {vi }. Given an auxiliary
variable l, we guarantee that l ≥ vi , for all i with the
for any i = 0, . . . , nm − 1, j = 0, . . . , mp − 1 and penalization
k = 0, . . . , np − 1.
Note that the number of constraints as well as the n
(
number of terms in the cost function grows fast in the
Y 1 if l ≥ vi ∀i
f =1− Θ(l − vi ) = (151)
dimension of the matrices to be multiplied. i=1
0 if ∃ i : vi > l,

which increases the energy if l is smaller than any vi . The


VIII. SUMMARY OF BUILDING BLOCKS maximum value of {vi } can be minimized by adding to
the cost function of the problem the value of l. In that
Here we summarize parts of cost functions and tech- case we also have to multiply the term from Eq. (151)
niques that are used as reoccurring building blocks for by the maximum value that l can take in order to avoid
the problems in this library. trading off the penalty.
25

4. Compare variables inequality vi ≤ a0 one can enforce this with a energy pe-
nalization for all values that do not satisfy the inequality
Given two variables v, w ∈ [1, K], the following term c=
X
δvai . (155)
indicates if v and w are equal: K>a>a0

It is also possible to have a weighted penalty, e.g.


K
(
X
α α 1 if v = w
δ(v − w) = δv δw = (152) X
α=1
0 if v 6= w. c= aδvai , (156)
K≥a>a0
If we want to check if v > w, then we use the step function which might be useful if the inequality is not a hard con-
Eq. (150). straint and more severe violations should be penalized
more. That option comes with the drawback of intro-
ducing in general higher energy scales in the system, es-
B. Constraints pecially if K is large, which might decrease the relative
energy gap.
Here we present the special case of functions of vari- b. Inequality constraints If the problem is restricted
ables where the groundstate fulfills useful constraints. by an inequality constraint:
These naturally serve as building blocks for enforcing
c(v1 , . . . , vN ) < K, (157)
constraints via penalties.
it is convenient to define an auxiliary variable y < K and
impose the constraint:
1. All (connected) variables are different 2
(y − c(v1 , . . . , vN )) = 0, (158)

If we have a set of variables {vi ∈ [1, K]}N so c is bound to be equal to some value of y, and the only
i=1 and two
variables vi , vj are connected when the entry Aij of the possible values for y are those that satisfy the inequality.
adjacency matrix A is 1, the following term has a mini- If the auxiliary variable y is expressed in binary or
mum when vi 6= vj for all connected i 6= j: gray encoding, then Eq. (158) must be modified when
2n < K < 2n+1 for some n ∈ N,
X 2
c= δ(vi − vj )Aij y − c(v1 , . . . , vN ) − 2n+1 + K = 0, (159)
i6=j
K (153) which ensures c(v1 , . . . , vN ) < K if y = 0, . . . , 2n+1 − 1.
XX
= δvαi δvαj Aij
i6=j α=1
4. Constraint preserving driver
The minimum value of c is zero, and it is only possi-
ble if and only if there is no connected pair i, j such When annealing inspired algorithms like quantum an-
that vi = vj . For all-to-all connectivity one can use this nealing or QAOA are used, an alternative to add penalty
building block to enforce that all variables are different. terms to the cost function that penalize states which do
If K = N , the condition vi 6= vj for all i 6= j is then not satisfy constraint is to start the algorithm in a state
equivalent to asking that each of the K possible values is that fulfills the constraints and use driver Hamiltonians
reached by a variable. that do not lead out of the constraint-fulfilling subspace
[6, 38]. As explained in Sec. V, we demand that the
driver Hamiltonian commutes with the operator gener-
2. Value α is used ating the constraints and that it reaches the entire valid
search space. Arbitrary polynomial constraints c can for
example be handled by using the Parity mapping. It
Given a set of N variables vi , we want to know if at P (u)
least one variable is taking the value α. This is done by brings constraints to the form u∈c gu σz . Starting in
the term a constraint-fulfilling state and employing constraint de-
pended flip-flop terms constructed from σ+ , σ− operators
as driver terms on the mapped spins [29] automatically
N
(
Y
α
 0 if ∃ vi = α
uα = 1 − δ vi = (154) enforces the constraints.
i=1
1 if @ vi = α.

C. Problem specific representations


3. Inequalities
For selected problems that are widely applicable, we
a. Inequalities of a single variable If a discrete vari- demonstrate useful techniques for mapping them to cost
able vi which can take values in 1, ..., K is subject to an functions.
26

1. Modulo 2 linear programming shown here for the ground state/solution to the prob-
lem at hand. For example, if we are interested in a sat-
For the set of linear equations isfying assignment for x1 ∧ ... ∧ xn we might formulate
the cost function in two different ways that both have
xA = y, (160) x1 = ...Q = xn = 1 as their Punique ground state; namely
n n
f = − i=1 xi and f = i=1 (1 − xi ). The two cor-
where A ∈ Fl×n n
2 , y ∈ F2 are given, we want to solve for
responding spin Hamiltonians will have vastly different
x ∈ Fl2 . The cost function properties, as the first one will have, when expressed in
  spin variables, 2n terms with up to n-th order interac-
Xn l
X tions, whereas the second one has n linear terms. Ad-
f= (1 − 2yi ) 1 − 2xj Aji  (161) ditionally, configurations which are closer to a fulfilling
i=1 j=1 assignment, i.e. have more of the xi equal to one, have
a lower cost in the second option which is not the case
minimizes the Hamming distance between xA and y and for the first option. Note that for x1 ∨ ... ∨ xn there is no
thus the ground state represents a solution. If we consider similar trick as we want to penalize exactly one configu-
the second factor for fixed i, we notice that it counts the ration.
number of 1s in x (mod 2) where Aji does not vanish at
the corresponding index. When acting with
D. Meta optimization
l
A
Y
Hi = σz,jji (162)
j=1
Finally, we present several options for choices that arise
in the construction of cost functions. This includes meta
on |xi we find the same result and thus the cost function parameters like coefficients of building blocks, but also
reoccurring methods and techniques in dealing with op-
n l timization problems.
A
X Y
f= (1 − 2yi ) sj ji , (163)
i=1 j=1
1. Auxiliary variables
with spin variables sj has the solution to Eq. (160) as its
ground state.
It is often useful to combine information about a set of
variables vi into one auxiliary variable y that is then used
2. Representation of boolean functions
in other parts of the cost function. Examples include the
element counting building block or the maximal value
of a set of variables where we showed a way to bind the
Given a boolean function f : {0, 1}n → {0, 1} we want value of an auxiliary variable to this function of the set of
to express it as a cost function in terms of the n boolean variables. If the function f can be expressed algebraically
variables. This is hard in general [58], but for (combina- in terms of the variable values/indicator functions (as a
tions of) local boolean functions there are simple expres- finite polynomial) the natural way to do this is via adding
sions. In particular we have, e.g., the constraint term
¬x1 =1 − x1 (164) 2
c (y − f (vi )) (169)
n
Y
x1 ∧ x2 ∧ ... ∧ xn = xi (165) with an appropriate coefficient c. To avoid the downsides
i=1 of introducing constraints, there is an alternative route
n
Y that might be preferred in certain cases. This alternative
x1 ∨ x2 ∨ ... ∨ xn =1 − (1 − xi ) (166) α
consists in simply using the variable indicator δy(v
i=1 ,δ β )
i vi

(x1 → x2 ) =1 − x1 + x1 x2 (167) and expressing the variable indicator function in terms


of spin/binary variables according to a chosen encoding
XOR(x1 , x2 ) ≡ x1 + x2 mod2 =x1 + x2 − 2x1 x2 . (168) (one does not even have to use the same encoding for y
and the vi ). As an example, let us consider the auxiliary
It is always possible to convert a k-SAT instance to 3- variable τm,t in the machine scheduling problem VII C 3
SAT and more generally any boolean formula to a 3-SAT for which we need to express δταm,t . For simplicity, we
in conjunctive normal form with the Tseytin transforma- choose the one-hot encoding for vm,t and τm,t , leading to
tion [59] with only a linear overhead in the size of the
formula. 
Note that for the purpose of encoding optimization (1 − xvm,t ,0 ) if α = t 6= 0

problems one might use different expressions that only δταm,t = xτm,t ,α = xvm,t ,0 if t = 0 (170)

0
need to coincide (up to a constant shift) with those else,
27

where the xvm,t ,β are the binary variables of the one-hot IX. CONCLUSION AND OUTLOOK
encoding of vm,t .
In this paper we have collected and elaborated on
a wide variety of optimization problems, which can be
2. Prioritization of cost function terms formulated as a spin Hamiltonian, allowing them to be
solved on a quantum computer. We first presented these
If a cost function is constructed out of multiple terms, optimization problems in an encoding independent fash-
ion. Only after choosing an encoding for the problem’s
f = a1 f1 + a2 f2 , (171) variables, do we obtain a spin Hamiltonian ready to
be fed into algorithms such as quantum annealing or
one often wants to prioritize one term over another, e.g., QAOA. Representing an optimization problem in terms
if the first term encodes a hard constraint. One strategy of a spin Hamiltonian already comprises some choices,
to ensure that f1 is not “traded off” against f2 is to one of which is an encoding. Ultimately, this freedom
evaluate the minimal cost ∆1 in f1 and the maximal gain arises from the fact that the solution to the problem is
∆2 in f2 from flipping one spin. It can be more efficient only encoded in the ground state of the Hamiltonian and
to do this independently for both terms and assume a the excited states are not fixed.
state close to an optimum. Then the coefficients can This is best illustrated with an example from the rep-
be set according to c1 ∆1 & c2 ∆2 . In general, this will resentation of the one-hot core term in Sec. III C where
depend on the encoding for two reasons; first, for some two alternatives were given to enforce the one-hot con-
encodings one has to add core terms to the cost function dition. If we were restricted to convert problems to a
which have to be taken into account for the prioritization QUBO form, the choice with lower order of interaction
(they usually have the highest priority). Furthermore, in would be superior but with the Parity mapping and ar-
some encodings a single spin flip can cause the value of chitecture, higher order interaction terms can be handled
a variable to change by more than one.2 Nonetheless, better and thus the Hamiltonian with fewer terms might
it can be an efficient heuristic to compare the “costs” be preferred.
and “gains” introduced above for pairs of cost function We have thus highlighted that obtaining the spin
terms already in the encoding independent formulation Hamiltonian for an optimization problem involves sev-
by evaluating them for single variable changes by one and eral choices. Rather than a burden, we consider this a
thereby fix their relative coefficients. Then one only has feature. It leaves room to improve the performance of
to fix the coefficient for the core term after the encoding quantum algorithms by making different choices. This is
is chosen. the second main contribution of this paper; we identified
common, reoccurring building blocks in the formulation
of optimization problems. These building blocks, like en-
codings, parts of cost functions or different choices for
3. Problem conversion imposing constraints, constitute a construction kit for
optimization problems. This paves the way to an auto-
Many decision problems associated to discrete opti- mated preprocessing algorithm picking the best choices
mization problems are NP-complete, meaning that one among the construction kit to facilitate the search for
can always map them to any other NP-complete prob- optimally performing quantum algorithms.
lem with only a polynomial overhead of classical runtime Finding the optimal spin Hamiltonian for a given hard-
in the system size. Therefore, a new problem for which ware platform for a quantum computer is in itself an
one does not have a cost function formulation could be important problem. Optimal spin Hamiltonians for dif-
classically mapped to another problem where such a for- ferent hardware platforms are very likely to differ, thus
mulation is at hand. However, since quantum algorithms it is important to have a procedure capable of tailoring a
are expected to deliver only a polynomial advantage for problem to any given platform. Our hardware agnostic
general NP-hard problems, it is advisable to carefully approach is a first key step towards this direction.
analyze the overhead. It is also possible that there are In Sec. VI we also demonstrated for two examples of
trade-offs as in the example k-SAT in Sec. VII E 2, where how to use the building blocks and make choices when
in the original formulation the order of terms in the cost constructing the spin Hamiltonian. These examples can
function is k while in the MIS formulation we naturally be used as a template to solve customized optimization
have a QUBO problem where the number of terms scales problems in a systematic way.
worse in the number of clauses. The next step in this program towards automation is
some explicit testing for selected problems. In particu-
lar, we plan to benchmark different encodings for vari-
ables. Ultimately, the testing and benchmark results will
2 E.g. in the binary encoding a spin flip can cause a variable shift be incorporated in a pipeline to automate the search for
of up to K/2 if the variables take values 1, ..., K. This is the Hamiltonian formulations leading to better quantum al-
main motivation to use modifications like the Gray encoding. gorithm performance.
28

X. ACKNOWLEDGEMENTS N38, QuantERA II Programme under Grant Agreement


No. 101017733, the Federal Ministry for Economic Af-
The authors thank Dr. Kaonan Micadei for fruitful fairs and Climate Action through project QuaST, and
discussions. Work was supported by the Austrian Sci- the Federal Ministry of Education and Research on the
ence Fund (FWF) through a START grant under Project basis of a decision by the German Bundestag.
No. Y1067-N27, the SFB BeyondC Project No. F7108-

[1] Boris Melnikov, “Discrete optimization problems-some ing quantum annealing systems with applications to a
new heuristic approaches,” in Eighth International Con- scalable architecture,” Npj Quantum Inf. 3, 21 (2017).
ference on High-Performance Computing in Asia-Pacific [16] M. Schöndorf and F.K. Wilhelm, “Nonpairwise interac-
Region (HPCASIA’05) (IEEE, 2005) pp. 73–82. tions induced by virtual transitions in four coupled arti-
[2] Marco Dorigo and Gianni Di Caro, “Ant colony optimiza- ficial atoms,” Phys. Rev. Appl. 12, 064026 (2019).
tion: a new meta-heuristic,” in Proceedings of the 1999 [17] Tim Menke, Florian Häse, Simon Gustavsson, Andrew J.
congress on evolutionary computation-CEC99 (Cat. No. Kerman, William D. Oliver, and Alán Aspuru-Guzik,
99TH8406), Vol. 2 (IEEE, 1999) pp. 1470–1477. “Automated design of superconducting circuits and its
[3] Nina Mazyavkina, Sergey Sviridov, Sergei Ivanov, and application to 4-local couplers,” Npj Quantum Inf. 7, 49
Evgeny Burnaev, “Reinforcement learning for combina- (2021).
torial optimization: A survey,” Comput. Oper. Res. 134, [18] Tim Menke, William P. Banner, Thomas R. Bergam-
105400 (2021). aschi, Agustin Di Paolo, Antti Vepsäläinen, Steven J.
[4] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, Weber, Roni Winik, Alexander Melville, Bethany M.
and Michael Sipser, “Quantum computation by adi- Niedzielski, Danna Rosenberg, Kyle Serniak, Mollie E.
abatic evolution,” arXiv:quant-ph/0001106 (2000), Schwartz, Jonilyn L. Yoder, Alán Aspuru-Guzik, Simon
10.48550/ARXIV.QUANT-PH/0001106. Gustavsson, Jeffrey A. Grover, Cyrus F. Hirjibehedin,
[5] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann, Andrew J. Kerman, and William D. Oliver, “Demonstra-
“A quantum approximate optimization algorithm,” tion of tunable three-body interactions between super-
arXiv:1411.4028 (2014), 10.48550/ARXIV.1411.4028. conducting qubits,” Phys. Rev. Lett. 129, 220501 (2022).
[6] Stuart Hadfield, Zhihui Wang, Bryan O’Gorman, [19] Yao Lu, Shuaining Zhang, Kuan Zhang, Wentao Chen,
Eleanor G Rieffel, Davide Venturelli, and Rupak Biswas, Yangchao Shen, Jialiang Zhang, Jing-Ning Zhang, and
“From the quantum approximate optimization algorithm Kihwan Kim, “Global entangling gates on arbitrary ion
to a quantum alternating operator ansatz,” Algorithms qubits,” Nature 572, 363–367 (2019).
12, 34 (2019). [20] G Pelegrı́, A J Daley, and J D Pritchard, “High-fidelity
[7] Naeimeh Mohseni, Peter L McMahon, and Tim Byrnes, multiqubit rydberg gates via two-photon adiabatic rapid
“Ising machines as hardware solvers of combinatorial op- passage,” Quantum Sci. Technol. 7, 045020 (2022).
timization problems,” Nat. Rev. Phys. 4, 363–379 (2022). [21] Kilian Ender, Roeland ter Hoeven, Benjamin E. Niehoff,
[8] Gary Kochenberger, Jin-Kao Hao, Fred Glover, Mark Maike Drieb-Schön, and Wolfgang Lechner, “Parity
Lewis, Zhipeng Lü, Haibo Wang, and Yang Wang, “The quantum optimization: Compiler,” arXiv:2105.06233
unconstrained binary quadratic programming problem: (2021), 10.48550/arXiv.2105.06233.
a survey,” J. Comb. Optim. 28, 58–81 (2014). [22] Wolfgang Lechner, “Quantum approximate optimization
[9] Itay Hen and Marcelo S. Sarandy, “Driver hamiltoni- with parallelizable gates,” IEEE Trans. Quantum Eng.
ans for constrained optimization in quantum annealing,” 1, 1–6 (2020).
Phys. Rev. A 93, 062312 (2016). [23] Michael Fellner, Anette Messinger, Kilian Ender, and
[10] Itay Hen and Federico M. Spedalieri, “Quantum anneal- Wolfgang Lechner, “Universal parity quantum comput-
ing for constrained optimization,” Phys. Rev. Applied 5, ing,” Phys. Rev. Lett. 129, 180503 (2022).
034007 (2016). [24] Wolfgang Lechner, Philipp Hauke, and Peter Zoller, “A
[11] Andrew Lucas, “Ising formulations of many np prob- quantum annealing architecture with all-to-all connectiv-
lems,” Front. Phys. 2, 5 (2014). ity from local interactions,” Sci. Adv. 1, e1500838 (2015).
[12] Samuel A. Wilkinson and Michael J. Hartmann, “Su- [25] Stuart Hadfield, Zhihui Wang, Eleanor G. Rieffel, Bryan
perconducting quantum many-body circuits for quan- O’Gorman, Davide Venturelli, and Rupak Biswas,
tum simulation and computing,” Appl. Phys. Lett. 116, “Quantum approximate optimization with hard and
230501 (2020). soft constraints,” Proceedings of the Second Interna-
[13] Clemens Dlaska, Kilian Ender, Glen Bigan Mbeng, An- tional Workshop on Post Moores Era Supercomputing,
dreas Kruckenhauser, Wolfgang Lechner, and Rick van PMES’17, 15–21 (2017).
Bijnen, “Quantum optimization via four-body rydberg [26] Yingyue Zhu, Zewen Zhang, Bhuvanesh Sundar,
gates,” Phys. Rev. Lett. 128, 120503 (2022). Alaina M. Green, C. Huerta Alderete, Nhung H.
[14] Niklas J. Glaser, Federico Roy, and Stefan Fil- Nguyen, Kaden R. A. Hazzard, and Norbert M. Linke,
ipp, “Controlled-controlled-phase gates for supercon- “Multi-round qaoa and advanced mixers on a trapped-
ducting qubits mediated by a shared tunable coupler,” ion quantum computer,” arXiv:2201.12335 (2022),
arXiv:2206.12392 (2022), 10.48550/ARXIV.2206.12392. 10.48550/ARXIV.2201.12335.
[15] N. Chancellor, S. Zohren, and P. A. Warburton, “Cir- [27] Franz Georg Fuchs, Kjetil Olsen Lye, Halvor Møll Nilsen,
cuit design for multi-body interactions in superconduct- Alexander Johannes Stasik, and Giorgio Sartor, “Con-
29

straint preserving mixers for the quantum approximate Mauerer, and Claudia Linnhoff-Popien, “A hybrid solu-
optimization algorithm,” Algorithms 15, 202 (2022). tion method for the capacitated vehicle routing problem
[28] Nicolas PD Sawaya, Albert T Schmitz, and Stuart Had- using a quantum annealer,” Front. ICT 6 (2019).
field, “Encoding trade-offs and design toolkits in quan- [44] Vicky Choi, “Adiabatic quantum algorithms for the
tum algorithms for discrete optimization: coloring, rout- np-complete maximum-weight independent set, exact
ing, scheduling, and other problems,” arXiv:2203.14432 cover and 3sat problems,” arXiv:1004.2226 (2010),
(2022), 10.48550/arXiv.2203.14432. 10.48550/ARXIV.1004.2226.
[29] Maike Drieb-Schön, Younes Javanmard, Kilian En- [45] Krzysztof Kurowski, Jan Weglarz, Marek Subocz, Rafal
der, and Wolfgang Lechner, “Parity quantum optimiza- Różycki, and Grzegorz Waligóra, “Hybrid quantum an-
tion: Encoding constraints,” arXiv:2105.06235 (2021), nealing heuristic method for solving job shop scheduling
10.48550/ARXIV.2105.06235. problem,” in Computational Science – ICCS 2020 , edited
[30] Nicholas Chancellor, “Domain wall encoding of discrete by Valeria V. Krzhizhanovskaya, Gábor Závodszky,
variables for quantum annealing and QAOA,” Quantum Michael H. Lees, Jack J. Dongarra, Peter M. A. Sloot,
Sci. Technol. 4, 045004 (2019). Sérgio Brissos, and João Teixeira (Springer International
[31] Jie Chen, Tobias Stollenwerk, and Nicholas Chancellor, Publishing, 2020) pp. 502–515.
“Performance of domain-wall encoding for quantum an- [46] Davide Venturelli, Dominic J. J. Marchand, and
nealing,” IEEE Trans. Quantum Eng. 2, 1–14 (2021). Galo Rojo, “Quantum annealing implementation
[32] Kensuke Tamura, Tatsuhiko Shirai, Hosho Katsura, Shu of job-shop scheduling,” arXiv:1506.08479 (2015),
Tanaka, and Nozomu Togawa, “Performance comparison 10.48550/ARXIV.1506.08479.
of typical binary-integer encodings in an ising machine,” [47] Amaro, David, Rosenkranz, Matthias, Fitzpatrick,
IEEE Access 9, 81032–81039 (2021). Nathan, Hirano, Koji, and Fiorentini, Mattia, “A case
[33] Nicolas P. D. Sawaya, Tim Menke, Thi Ha Kyaw, Sonika study of variational quantum algorithms for a job shop
Johri, Alán Aspuru-Guzik, and Gian Giacomo Guer- scheduling problem,” EPJ Quantum Technol. 9, 5 (2022).
reschi, “Resource-efficient digital quantum simulation [48] Costantino Carugno, Maurizio Ferrari Dacrema, and
of d-level systems for photonic, vibrational, and spin-s Paolo Cremonesi, “Evaluating the job shop scheduling
hamiltonians,” Npj Quantum Inf. 6, 49 (2020). problem on a d-wave quantum annealer,” Scientific Re-
[34] Alejandro Montanez-Barrera, Alberto Maldonado-Romo, ports 12, 6539 (2022).
Dennis Willsch, and Kristel Michielsen, “Unbalanced [49] Kazuki Ikeda, Yuma Nakamura, and Travis S Humble,
penalization: A new approach to encode inequal- “Application of quantum annealing to nurse scheduling
ity constraints of combinatorial problems for quan- problem,” Sci. Rep. 9, 12837 (2019).
tum optimization algorithms,” arXiv:2211.13914 (2022), [50] DA Rachkovskii, SV Slipchenko, EM Kussul, and
10.48550/arXiv.2211.13914. TN Baidyk, “Properties of numeric codes for the scheme
[35] Jesse Berwald, Nicholas Chancellor, and Raouf of random subspaces rsc,” Cybern. Syst. Anal. 41, 509–
Dridi, “Understanding domain-wall encoding theoret- 520 (2005).
ically and experimentally,” arXiv:2108.12004 (2021), [51] Luc Devroye, Peter Epstein, and Jörg-Rüdiger Sack,
10.48550/ARXIV.2108.12004. “On generating random intervals and hyperrectangles,”
[36] John Preskill, “Quantum computing in the nisq era and J. Comput. Graph. Stat. 2, 291–307 (1993).
beyond,” Quantum 2, 79 (2018). [52] Matthew Elliott, Benjamin Golub, and Matthew O Jack-
[37] Fernando Pastawski and John Preskill, “Error correc- son, “Financial networks and contagion,” Am. Econ. Rev.
tion for encoded quantum annealing,” Phys. Rev. A 93, 104, 3115–53 (2014).
052325 (2016). [53] Román Orús, Samuel Mugel, and Enrique Lizaso,
[38] Kilian Ender, Anette Messinger, Michael Fellner, “Forecasting financial crashes with quantum computing,”
Clemens Dlaska, and Wolfgang Lechner, “Modular par- Phys. Rev. A 99, 060301 (2019).
ity quantum approximate optimization,” PRX Quantum [54] Syun Izawa, Koki Kitai, Shu Tanaka, Ryo Tamura,
3 (2022), 10.1103/prxquantum.3.030304. and Koji Tsuda, “Continuous black-box optimization
[39] Michael Fellner, Kilian Ender, Roeland ter Hoeven, with quantum annealing and random subspace coding,”
and Wolfgang Lechner, “Parity quantum optimization: arXiv:2104.14778 (2021), 10.48550/ARXIV.2104.14778.
Benchmarks,” arXiv:2105.06240 (2021). [55] Ching-Yi Lai, Kao-Yueh Kuo, and Bo-Jyun Liao, “Syn-
[40] Josua Unger, Anette Messinger, Benjamin E. Niehoff, drome decoding by quantum approximate optimization,”
Michael Fellner, and Wolfgang Lechner, “Low- arXiv:2207.05942 (2022), 10.48550/ARXIV.2207.05942.
depth circuit implementation of parity constraints [56] Volker Strassen, “Gaussian elimination is not optimal,”
for quantum optimization,” arXiv:2211.11287 (2022), Numer. Math. 13, 354–356 (1969).
10.48550/ARXIV.2211.11287. [57] Alhussein Fawzi, Matej Balog, Aja Huang, Thomas
[41] James King, Sheir Yarkoni, Jack Raymond, Isil Ozfidan, Hubert, Bernardino Romera-Paredes, Mohammadamin
Andrew D King, Mayssam Mohammadi Nevisi, Jeremy P Barekatain, Alexander Novikov, Francisco J R Ruiz, Ju-
Hilton, and Catherine C McGeoch, “Quantum annealing lian Schrittwieser, Grzegorz Swirszcz, et al., “Discovering
amid local ruggedness and global frustration,” J. Phys. faster matrix multiplication algorithms with reinforce-
Soc. Jpn. 88, 061007 (2019). ment learning,” Nature 610, 47–53 (2022).
[42] Martin Lanthaler and Wolfgang Lechner, “Minimal con- [58] Stuart Hadfield, “On the representation of boolean and
straints in the parity formulation of optimization prob- real functions as hamiltonians for quantum computing,”
lems,” New J. Phys. 23, 083039 (2021). ACM Trans. Quant. Comput. 2, 1–21 (2021).
[43] Sebastian Feld, Christoph Roch, Thomas Gabor, Chris- [59] Grigori S Tseitin, “On the complexity of derivation
tian Seidel, Florian Neukart, Isabella Galter, Wolfgang in propositional calculus,” in Automation of reasoning
(Springer, 1983) pp. 466–483.

You might also like