Professional Documents
Culture Documents
Tensor Quantum
Tensor Quantum
Tensor Quantum
Institute for Quantum Computing, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
5
School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
6
Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
7
Department of Mechanical & Mechatronics Engineering,
University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
8
Department of Physics & Astronomy, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
9
Division of Physics, Mathematics and Astronomy, Caltech, Pasadena, CA 91125
10
Fermi National Accelerator Laboratory, P.O. Box 500, Batavia, IL, 605010
11
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
12
Data:Lab, Volkswagen Group, Ungererstr. 69, 80805 München, Germany
13
Leiden University, Niels Bohrweg 1, 2333 CA Leiden, Netherlands
14
Quantum Artificial Intelligence Laboratory, NASA Ames Research Center (QuAIL)
15
USRA Research Institute for Advanced Computer Science (RIACS)
16
Institute for Theoretical Physics, University of Innsbruck, Technikerstr. 21a, A-6020 Innsbruck, Austria
17
University Erlangen-Nürnberg (FAU), Institute of Theoretical Physics, Staudtstr. 7, 91058 Erlangen, Germany
18
Volkswagen Group Advanced Technologies, San Francisco, CA 94108
19
Rahko Ltd., Finsbury Park, London, N4 3JP, United Kingdom
20
Department of Computer Science, University College London, WC1E 6BT, United Kingdom
21
Clarendon Laboratory, Department of Physics, University of Oxford. Oxford, OX1 3PU, United Kingdom
22
Vector Institute, MaRS Centre, Toronto, Ontario, M5G 1M1, Canada
23
Department of Physics and Astronomy, University of Waterloo, Ontario, N2L 3G1, Canada
24
Institute for Quantum Information and Matter, Caltech, Pasadena, CA, USA
25
Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA
(Dated: August 30, 2021)
We introduce TensorFlow Quantum (TFQ), an open source library for the rapid prototyping of
hybrid quantum-classical models for classical or quantum data. This framework offers high-level ab-
stractions for the design and training of both discriminative and generative quantum models under
TensorFlow and supports high-performance quantum circuit simulators. We provide an overview
of the software architecture and building blocks through several examples and review the theory of
hybrid quantum-classical neural networks. We illustrate TFQ functionalities via several basic appli-
cations including supervised learning for quantum classification, quantum control, simulating noisy
quantum circuits, and quantum approximate optimization. Moreover, we demonstrate how one can
apply TFQ to tackle advanced quantum learning tasks including meta-learning, layerwise learning,
Hamiltonian learning, sampling thermal states, variational quantum eigensolvers, classification of
quantum phase transitions, generative adversarial networks, and reinforcement learning. We hope
this framework provides the necessary tools for the quantum computing and machine learning re-
search communities to explore models of both natural and artificial quantum systems, and ultimately
discover new quantum algorithms which could potentially yield a quantum advantage.
TensorFlow 6 2. Implementation 31
D. TFQ architecture 7
1. Design Principles and Overview 7 V. Advanced Quantum Applications 31
2. The Abstract TFQ Pipeline for a specific A. Meta-learning for Variational Quantum
hybrid discriminator model 8 Optimization 32
3. Hello Many-Worlds: Binary Classifier for
Quantum Data 9 B. Vanishing Gradients and Adaptive Layerwise
E. TFQ Building Blocks 10 Training Strategies 33
1. Quantum Computations as Tensors 11 1. Random Quantum Circuits and Barren
2. Composing Quantum Models 11 Plateaus 33
3. Sampling and Expectation Values 11 2. Layerwise quantum circuit learning 34
4. Differentiating Quantum Circuits 12 C. Hamiltonian Learning with Quantum Graph
5. Simplified Layers 12 Recurrent Neural Networks 35
6. Basic Quantum Datasets 13 1. Motivation: Learning Quantum Dynamics
F. High Performance Quantum Circuit with a Quantum Representation 35
Simulation with qsim 13
2. Implementation 36
1. Comment on the Simulation of Quantum
Circuits 13 D. Generative Modelling of Quantum Mixed
2. Gate Fusion with qsim 13 States with Hybrid Quantum-Probabilistic
3. Hardware-Acceleration 14 Models 38
4. Benchmarks 14 1. Background 38
5. Large-scale simulation for quantum 2. Variational Quantum Thermalizer 39
machine learning 15 3. Quantum Generative Learning from
6. Noise in qsim 15 Quantum Data 41
E. Subspace-Search Variational Quantum
III. Theory of Hybrid Quantum-Classical Machine
Eigensolver for Excited States: Integration
Learning 15
with OpenFermion 41
A. Quantum Neural Networks 15
B. Sampling and Expectations 16 1. Background 41
C. Gradients of Quantum Neural Networks 17 2. Implementation 42
1. Finite difference methods 17 F. Classification of Quantum Many-Body Phase
2. Parameter shift methods 17 Transitions 44
3. Stochastic Parameter Shift Gradient 1. Background 44
Estimation 18 2. Implementation 45
4. Adjoint Gradient Backpropagation in
G. Quantum Generative Adversarial Networks 46
Simulations 18
D. Hybrid Quantum-Classical Computational 1. Background 46
Graphs 20 2. Noise Suppression with Adversarial
1. Hybrid Quantum-Classical Neural Learning 46
Networks 20 3. Approximate Quantum Random Access
E. Autodifferentiation through hybrid Memory 47
quantum-classical backpropagation 21 H. Reinforcement Learning with Parameterized
Quantum Circuits 48
IV. Basic Quantum Applications 22
A. Hybrid Quantum-Classical Convolutional 1. Background 48
Neural Network Classifier 22 2. Policy-Gradient Reinforcement Learning
1. Background 22 with PQCs 49
2. Implementations 23 3. Implementation 49
B. Hybrid Machine Learning for Quantum 4. Value-Based Reinforcement Learning with
Control 25 PQCs 51
1. Time-Constant Hamiltonian Control 26 5. Quantum Environments 51
2. Time-dependent Hamiltonian Control 27
C. Simulating Noisy Circuits 28
VI. Closing Remarks 52
1. Background 28
2. Noise in Cirq and TFQ 28
3. Training Models with Noise 29 VII. Acknowledgements 52
D. Quantum Approximate Optimization 30
1. Background 30 References 52
3
near-term quantum devices and quantum processors. tum phases, hybrid quantum-classical ML for quantum
Of course, this is not a comprehensive list of quantum control, and MaxCut QAOA. In the advanced applica-
data. We hope that, with proper software tooling, re- tions section V, we describe meta-learning for quantum
searchers will be able to find applications of QML in all approximate optimization, discuss issues with vanishing
of the above areas and other categories of applications gradients and how we can overcome them by adaptive
beyond what we can currently envision. layer-wise learning schemes, Hamiltonian learning with
quantum graph networks, quantum mixed state genera-
tion via classical energy-based models, subspace-Search
D. TensorFlow Quantum variational quantum eigensolver for excited states to il-
lustrate an integration with OpenFermion, quantum clas-
sification of quantum phase transitions, entangling quan-
Today, exploring new hybrid quantum-classical mod-
tum generative adversarial networks, and quantum rein-
els is a difficult and error-prone task. The engineering
forcement learning.
effort required to manually construct such models, de-
We hope that TFQ enables the machine learning and
velop quantum datasets, and set up training and valida-
quantum computing communities to work together more
tion stages decreases a researcher’s ability to iterate and
closely on important challenges and opportunities in the
discover. TensorFlow has accelerated the research and
near-term and beyond.
understanding of deep learning in part by automating
common model building tasks. Development of software
tooling for hybrid quantum-classical models should simi-
II. SOFTWARE ARCHITECTURE & BUILDING
larly accelerate research and understanding for quantum
BLOCKS
machine learning.
To develop such tooling, the requirement of accommo-
dating a heterogeneous computational environment in- As stated in the introduction, the goal of TFQ is
volving both classical and quantum processors is key. to bridge the quantum computing and machine learn-
This computational heterogeneity suggested the need to ing communities. Google already has well-established
expand TensorFlow, which is designed to distribute com- products for these communities: Cirq, an open source
putations across CPUs, GPUs, and TPUs [67], to also en- library for invoking quantum circuits [68], and Tensor-
compass quantum processing units (QPUs). This project Flow, an end-to-end open source machine learning plat-
has evolved into TensorFlow Quantum. TFQ is an inte- form [67]. However, the emerging community of quantum
gration of Cirq with TensorFlow that allows researchers machine learning researchers requires the capabilities of
and students to simulate QPUs while designing, training, both products. The prospect of combining Cirq and Ten-
and testing hybrid quantum-classical models, and eventu- sorFlow then naturally arises.
ally run the quantum portions of these models on actual First, we review the capabilities of Cirq and Tensor-
quantum processors as they come online. A core con- Flow. We confront the challenges that arise when one
tribution of TFQ is seamless backpropagation through attempts to combine both products. These challenges
combinations of classical and quantum layers in hybrid inform the design goals relevant when building software
quantum-classical models. This allows QML researchers specific to quantum machine learning. We provide an
to directly harness the rich set of tools already available overview of the architecture of TFQ and describe a par-
in TF and Keras. ticular abstract pipeline for building a hybrid model for
The remainder of this document describes TFQ and a classification of quantum data. Then we illustrate this
selection of applications demonstrating some of the chal- pipeline via the exposition of a minimal hybrid model
lenges TFQ can help tackle. In section II, we introduce which makes use of the core features of TFQ. We con-
the software architecture of TFQ. We highlight its main clude with a description of our performant C++ sim-
features including batched circuit execution, automated ulator for quantum circuits and provide benchmarks of
expectation estimation, estimation of quantum gradients, performance on two complementary classes of dense and
hybrid quantum-classical automatic differentiation, and sparse quantum circuits.
rapid model construction, all from within TensorFlow.
We also present a simple “Hello, World" example for bi-
nary quantum data classification on a single qubit. By A. Cirq
the end of section II, we expect most readers to have suf-
ficient knowledge to begin development with TFQ. For Cirq is an open-source framework for invoking quan-
readers who are interested in a more theoretical under- tum circuits on near term devices [68]. It contains
standing of QNNs, we provide in section III an overview the basic structures, such as qubits, gates, circuits, and
of QNN models and hybrid quantum-classical backprop- measurement operators, that are required for specifying
agation. For researchers interested in applying TFQ to quantum computations. User-specified quantum compu-
their own projects, we provide various applications in tations can then be executed in simulation or on real
sections IV and V. In section IV, we describe hybrid hardware. Cirq also contains substantial machinery that
quantum-classical CNNs for binary classification of quan- helps users design efficient algorithms for NISQ machines,
6
using the tfq expectation value calculation op. We feed Classical Data: Quantum Data:
the output into the tf.math.abs op to show that tfq ops integers/floats/strings Circuits/Operators
integrate naively with tf ops.
TF Keras Models
qubit = cirq . GridQubit (0 , 0) TensorFlow
theta = sympy . Symbol ( ’ theta ’) TFQ
TF Layers TFQ Layers
c = cirq . Circuit ( cirq . X ( qubit ) ** theta ) Differentiators
TFQ
c_tensor = tfq . c o n v e r t _ t o _ t e n s o r ([ c ] * 3)
theta_values = tf . constant ([[0] ,[1] ,[2]]) Classical TF Ops TFQ Ops
m = cirq . Z ( qubit ) hardware
paulis = tfq . c o n v e r t _ t o _ t e n s o r ([ m ] * 3)
Quantum TF Execution Engine TFQ qsim Cirq
exp ectati on_op = tfq . g e t _ e x p e c t a t i o n _ o p ()
output = e xpecta tion_o p ( hardware
c_tensor , [ ’ theta ’] , theta_values , paulis ) TPU GPU CPU QPU
abs_output = tf . math . abs ( output )
We supply the expectation op with a tensor of parame- Figure 4. The software stack of TFQ, showing its interactions
terized circuits, a list of symbols contained in the circuits, with TensorFlow, Cirq, and computational hardware. At the
a tensor of values to use for those symbols, and tensor top of the stack is the data to be processed. Classical data
operators to measure with respect to. Given this, it out- is natively processed by TensorFlow; TFQ adds the ability to
puts a tensor of expectation values. The graph this code process quantum data, consisting of both quantum circuits
generates is given by Fig. 3. and quantum operators. The next level down the stack is the
The expectation op is capable of running circuits on Keras API in TensorFlow. Since a core principle of TFQ is
a simulated backend, which can be a Cirq simulator or native integration with core TensorFlow, in particular with
our native TFQ simulator qsim (described in detail in Keras models and optimizers, this level spans the full width
of the stack. Underneath the Keras model abstractions are
section II F), or on a real device. This is configured on
our quantum layers and differentiators, which enable hybrid
instantiation. quantum-classical automatic differentiation when connected
The expectation op is fully differentiable. Given with classical TensorFlow layers. Underneath the layers and
that there are many ways to calculate the gradient of differentiators, we have TensorFlow ops, which instantiate the
a quantum circuit with respect to its input parame- dataflow graph. Our custom ops control quantum circuit ex-
ters, TFQ allows expectation ops to be configured with ecution. The circuits can be run in simulation mode, by in-
one of many built-in differentiation methods using the voking qsim or Cirq, or eventually will be executed on QPU
tfq.Differentiator interface, such as finite differencing, hardware.
parameter shift rules, and various stochastic methods.
The tfq.Differentiator interface also allows users to de-
fine their own gradient calculation methods for their spe- discriminator model that TFQ was designed to support.
cific problem if they desire. Then we illustrate the TFQ pipeline via a Hello Many-
The tensor representation of circuits and Paulis along Worlds example, which involves building the simplest
with the execution ops are all that are required to solve possible hybrid quantum-classical model for a binary
any problem in QML. However, as a convenience, TFQ classification task on a single qubit. More detailed in-
provides an additional op for in-graph circuit construc- formation on the building blocks of TFQ features will be
tion. This was found to be convenient when solving prob- given in section II E.
lems where most of the circuit being run is static and
only a small part of it is being changed during train-
ing or inference. This functionality is provided by the 2. The Abstract TFQ Pipeline for a specific hybrid
discriminator model
tfq.tfq_append_circuit op. It is expected that all but
the most dedicated users will never touch these low-
level ops, and instead will interface with TFQ using our Here, we provide a high-level abstract overview of the
tf.keras.layers that provide a simplified interface. computational steps involved in the end-to-end pipeline
The tools provided by TFQ can interact with both core for inference and training of a hybrid quantum-classical
TensorFlow and, via Cirq, real quantum hardware. The discriminative model for quantum data in TFQ.
functionality of all three software products and the inter- (1) Prepare Quantum Dataset: In general, this
faces between them can be visualized with the help of a might come from a given black-box source. However,
“software-stack" diagram, shown in Fig. 4. Importantly, as current quantum computers cannot import quantum
these interfaces allow users to write a single TensorFlow data from external sources, the user has to specify quan-
Quantum program which could then easily be run locally tum circuits which generate the data. Quantum datasets
on a workstation, in a highly parallel and distributed set- are prepared using unparameterized cirq.Circuit ob-
ting at the PetaFLOP/s or higher throughput scale [22], jects and are injected into the computational graph using
or on real QPU device [76]. tfq.convert_to_tensor .
In the next section, we describe an example of an (2) Evaluate Quantum Model: Parameterized
abstract quantum machine learning pipeline for hybrid quantum models can be selected from several categories
9
Evaluate Gradients & is unsupervised. Wrapping the model built in stages (1)
Update Parameters
through (4) inside a tf.keras.Model gives the user access
to all the losses in the tf.keras.losses module.
(6) Evaluate Gradients & Update Parameters:
Evaluate After evaluating the cost function, the free parame-
Cost ters in the pipeline is updated in a direction expected
Function
to decrease the cost. This is most commonly per-
formed via gradient descent. To support gradient de-
𝛉
scent, TFQ exposes derivatives of quantum operations
Prepare Evaluate Sample Evaluate
to the TensorFlow backpropagation machinery via the
Quantum Dataset Quantum or Classical tfq.differentiators.Differentiator interface. This allows
Model Average Model both the quantum and classical models’ parameters to
be optimized against quantum data via hybrid quantum-
Figure 5. Abstract pipeline for inference and training of classical backpropagation. See section III for details on
a hybrid discriminative model in TFQ. Here, Φ represents the theory.
the quantum model parameters and θ represents the classical In the next section, we illustrate this abstract pipeline
model parameters. by applying it to a specific example. While simple, the
example is the minimum instance of a hybrid quantum-
classical model operating on quantum data.
based on knowledge of the quantum data’s structure.
The goal of the model is to perform a quantum compu-
tation in order to extract information hidden in a quan- 3. Hello Many-Worlds: Binary Classifier for Quantum
tum subspace or subsystem. In the case of discrimina- Data
tive learning, this information is the hidden label pa-
rameters. To extract a quantum non-local subsystem, Binary classification is a basic task in machine learn-
the quantum model disentangles the input data, leaving ing that can be applied to quantum data as well. As a
the hidden information encoded in classical correlations, minimal example of a hybrid quantum-classical model,
thus making it accessible to local measurements and clas- we present here a binary classifier for regions on a sin-
sical post-processing. Quantum models are constructed gle qubit. In this task, two random vectors in the X-Z
using cirq.Circuit objects containing SymPy symbols, plane of the Bloch sphere are chosen. Around these two
and can be attached to quantum data sources using the vectors, we randomly sample two sets of quantum data
tfq.AddCircuit layer. points; the task is to learn to distinguish the two sets. An
example quantum dataset of this type is shown in Fig. 6.
(3) Sample or Average: Measurement of quantum
The following can all be run in-browser by navigating to
states extracts classical information, in the form of sam-
the Colab example notebook at
ples from a classical random variable. The distribution
of values from this random variable generally depends on research/binary_classifier/binary_classifier.ipynb
both the quantum state itself and the measured observ-
able. As many variational algorithms depend on mean Additionally, the code in this example can be copy-pasted
values of measurements, TFQ provides methods for av- into a python script after installing TFQ.
eraging over several runs involving steps (1) and (2). To solve this problem, we use the pipeline shown in
Sampling or averaging are performed by feeding quan- Fig. 5, specialized to one-qubit binary classification. This
tum data and quantum models to the tfq.Sample or specialization is shown in Fig. 7.
The first step is to generate the quantum data. We can
tfq.Expectation layers.
use Cirq for this task. The common imports required for
(4) Evaluate Classical Model: Once classical working with TFQ are shown below:
information has been extracted, it is in a format import cirq , random , sympy
amenable to further classical post-processing. As the import numpy as np
extracted information may still be encoded in classi- import tensorflow as tf
cal correlations between measured expectations, clas- import t e n s o r f l o w _ q u a n t u m as tfq
sical deep neural networks can be applied to distill The function below generates the quantum dataset; la-
such correlations. Since TFQ is fully compatible with bels use a one-hot encoding:
core TensorFlow, quantum models can be attached di-
rectly to classical tf.keras.layers.Layer objects such as def g e n e r a t e _ d a t a s e t (
qubit , theta_a , theta_b , num_samples ) :
tf.keras.layers.Dense . q_data = []
(5) Evaluate Cost Function: Given the results of labels = []
blob_size = abs ( theta_a - theta_b ) / 5
classical post-processing, a cost function is calculated. for _ in range ( num_samples ) :
This may be based on the accuracy of classification if the coin = random . random ()
quantum data was labeled, or other criteria if the task spread_x , spread_y = np . random . uniform (
10
We can generate a dataset and the associated labels after Having provided a minimum working example in the
picking some parameter values: previous section, we now seek to provide more details
qubit = cirq . GridQubit (0 , 0)
about the components of the TFQ framework. First, we
theta_a = 1 describe how quantum computations specified in Cirq are
theta_b = 4 converted to tensors for use inside the TensorFlow graph.
num_samples = 200 Then, we describe how these tensors can be combined in-
q_data , labels = g e n e r a t e _ d a t a s e t ( graph to yield larger models. Next, we show how circuits
qubit , theta_a , theta_b , num_samples )
are simulated and measured in TFQ. The core functional-
As our quantum parametric model, we use the simplest ity of the framework, differentiation of quantum circuits,
case of a universal quantum discriminator [43, 66], a is then explored. Finally, we describe our more abstract
single parameterized rotation (linear) and measurement layers, which can be used to simplify many QML work-
along the Z axis (non-linear): flows.
11
As pointed out in section II A, Cirq already contains Sampling from quantum circuits is an important use
the language necessary to express quantum computa- case in quantum computing. The recently achieved mile-
tions, parameterized circuits, and measurements. Guided stone of quantum supremacy [13] is one such application,
by principle 4, TFQ should allow direct injection of Cirq in which the difficulty of sampling from a quantum model
expressions into the computational graph of TensorFlow. was used to gain a computational edge over classical ma-
This is enabled by the tfq.convert_to_tensor function. chines.
We saw the use of this function in the quantum binary TFQ implements tfq.layers.Sample , a Keras layer
classifier, where a list of data generation circuits spec- which enables sampling from batches of circuits in sup-
ified in Cirq was wrapped in this function to promote port of design objective 2. The user supplies a ten-
them to tensors. Below we show how a quantum data sor of parameterized circuits, a list of symbols con-
point, a quantum model, and a quantum measurement tained in the circuits, and a tensor of values to sub-
can be converted into tensors: stitute for the symbols in the circuit. Given these,
the Sample layer produces a tf.RaggedTensor of shape
q0 = cirq . GridQubit (0 , 0) [batch_size, num_samples, n_qubits] , where the n_qubits
q_data_raw = cirq . Circuit ( cirq . H ( q0 ) )
q_data = tfq . c o n v e r t _ t o _ t e n s o r ([ q_data_raw ]) dimension is ragged to account for the possibly varying
circuit size over the input batch of quantum data. For
theta = sympy . Symbol ( ’ theta ’) example, the following code takes the combined data and
q_model_raw = cirq . Circuit ( model from section II E 2 and produces a tensor of size
cirq . Ry ( theta ) . on ( q0 ) )
q_model = tfq . c o n v e r t _ t o _ t e n s o r ([ q_model_raw ])
[1, 4, 1] containing four single-bit samples:
sample_layer = tfq . layers . Sample ()
q_measure_raw = 0.5 * cirq . Z ( q0 ) samples = sample_layer (
q_measure = tfq . c o n v e r t _ t o _ t e n s o r ( data_and_model , symbol_names =[ ’ theta ’] ,
[ q_measure_raw ]) symbol_values =[[0.5]] , repetitions =4)
Simulator
5 TFQ qsim - GPU
TFQ qsim - CPU
3. Hardware-Acceleration
With the given quantum circuit fused into the mini- circuits were generated
function using the Cirq
mal number of up to F -qubit gates, we need simulators cirq.generate_boixo_2018_supremacy_circuits_v2. These
optimized for applying up to 2F × 2F matrices to state
circuits are only tractable for benchmarking due to the
vectors. TFQ will adapt to the user’s available hard-
small numbers of qubits involved here. These circuits
ware. For CPU based simulations, SSE2 instruction set
involve dense (subject to a constraint [84]) interleaved
[82] and AVX2 + AVX512 instruction sets [83] will be
two-qubit gates to generate entanglement as quickly as
detected and used to increase performance. If a compat-
possible. In summary, at the largest benchmarked prob-
ible CUDA GPU is detected TFQ will be able to switch
lem size of 30 qubits, qsim achieves an approximately
to GPU based simulation as well. The next section illus-
10-fold improvement in simulation time over Cirq. The
trates this power with benchmarks comparing the per-
performance curves are shown in Fig. 9.
formance of TFQ to the performance of parallelized Cirq
running in simulation mode. In the future, we hope to ex- When the simulated circuits have more sparse struc-
pand the range of custom simulation hardware supported ture, the Fusion algorithm allows us to achieve a larger
to include TPU integration. performance boost by reducing the number of gates that
ultimately need to be simulated. The circuits for this
task are a factorized version of the supremacy-style cir-
4. Benchmarks
cuits which generate entanglement only on small subsets
of the qubits. In summary, for these circuits, we find
Here, we demonstrate the performance of TFQ, backed a roughly 100 times improvement in simulation time in
by qsim, relative to Cirq on two benchmark simula- TFQ versus Cirq. The performance curves are shown in
tion tasks. As detailed above, the performance differ- Fig. 9.
ence is due to circuit pre-processing via gate fusion com-
bined with performant C++ simulators. The bench- Thus we see that in addition to our core functionality
marks were performed on a desktop equipped with an In- of implementing native TensorFlow gradients for quan-
tel(R) Xeon(R) Gold 6154 CPU (18 cores and 36 threads) tum circuits, TFQ also provides a significant boost in
and an NVidia V100 GPU. performance over Cirq when running in simulation mode.
The first benchmark task is simulation of 500 Additionally, as noted before, this performance boost is
random (early variant of beyond-classical/supremacy- despite the additional overhead of serialization between
style) circuits batched 50 at a time. These the TensorFlow frontend and qsim proper.
15
5. Large-scale simulation for quantum machine learning properties of interest are measured against the resulting
pure state. The process must be run many times, and
Recently, we have provided a series of learning- the results averaged; but, the memory required at any
theoretic tests for evaluating whether a quantum machine one time is only 2N . When a circuit is not too noisy, it
learning model can predict more accurately than classical is often more efficient to average over trajectories than
machine learning models [22]. One of the key findings is it is to simulate the full density matrix, so we choose
that various existing quantum machine learning models this method for TFQ. For information on how to use this
perform slightly better than standard classical machine feature, see IV C.
learning models in small system sizes. However, these
quantum machine learning models perform substantially
worse than classical machine learning models in larger III. THEORY OF HYBRID
QUANTUM-CLASSICAL MACHINE LEARNING
system sizes (more than 10 to 15 qubits). Note that per-
forming better in small system sizes is not useful since a
classical algorithm can efficiently simulate the quantum In the previous section, we reviewed the building blocks
machine learning model. The above observation shows required for use of TFQ. In this section, we consider the
that understanding the performance of quantum machine theory behind the software. We define quantum neu-
learning models in large system sizes is very crucial for ral networks as products of parameterized unitary ma-
assessing whether the quantum machine learning model trices. Samples and expectation values are defined by
will eventually provide an advantage over classical mod- expressing the loss function as an inner product. With
els. Ref. [22] also provide improvements to existing quan- quantum neural networks and expectation values defined,
tum machine learning models to improve the prediction we can then define gradients. We finally combine quan-
performance in large system sizes. tum and classical neural networks and formalize hybrid
In the numerical experiments conducted in [22], we quantum-classical backpropagation, one of core compo-
consider the simulation of these quantum machine learn- nents of TFQ.
ing models up to 30 qubits. The large-scale simulation
allows us to gauge the potential and limitations of dif-
ferent quantum machine learning models better. We uti- A. Quantum Neural Networks
lize the qsim software package in TFQ to perform large-
scale quantum simulations using Google Cloud Plat- A Quantum Neural Network ansatz can generally be
form. The simulation reaches a peak throughput of up written as a product of layers of unitaries in the form
to 1.1 quadrillion floating-point operations per second
L
(petaflop/s). Trends of approximately 300 teraflop/s for Y
quantum simulation and 800 teraflop/s for classical anal- Û (θ) = V̂ ` Û ` (θ ` ), (1)
ysis were observed up to the maximum experiment size `=1
N
X
f (θ) = hĤiθ = αk hĥk iθ ≡ α · hθ , (8)
k=1
estimation. This can greatly reduce the number of mea- value of the circuit for Hamiltonians which have a single-
surements needed per gradient update, we will cover this term in their Pauli decomposition (3) (or, alternatively, if
in subsection III C 3. the Hamiltonian has a spectrum {±λ} for some positive
λ). For multi-term Hamiltonians, in [98] a method to
obtain the analytic gradients is proposed which uses a
C. Gradients of Quantum Neural Networks linear combination of unitaries. Here, instead, we will
simply use a change of variables and the chain rule to
Now that we have established how to evaluate the obtain analytic gradients of parametric unitaries of the
loss function, let us describe how to obtain gradients of form (5) without the need for ancilla qubits or additional
the cost function with respect to the parameters. Why unitaries.
should we care about gradients of quantum neural net- For a parameter of interest θj` appearing in a layer ` in a
works? In classical deep learning, the most common fam- parametric circuit in the form (5), consider the change of
ily of optimization heuristics for the minimization of cost variables ηkj` ≡ θj` βkj` , then from the chain rule of calculus
functions are gradient-based techniques [90–92], which [99], we have
include stochastic gradient descent and its variants. To
leverage gradient-based techniques for the learning of ∂f X ∂f ∂η j` X j` ∂f
multilayered models, the ability to rapidly differentiate `
= j`
k
`
= βk . (11)
∂θj ∂θj ∂ηk
error functionals is key. For this, the backwards propa- k ∂ηk k
gation of errors [93] (colloquially known as backprop), is Thus, all we need to compute are the derivatives of the
a now canonical method to progressively calculate gradi-
cost function with respect to ηkj` . Due to this change of
ents of parameters in deep networks. In its most general
form, this technique is known as automatic differentiation variables, we need to reparameterize our unitary Û (θ)
[94], and has become so central to deep learning that this from (1) as
feature of differentiability is at the core of several frame- O Y j`
works for deep learning, including of course TensorFlow Û ` (θ ` ) 7→ Û ` (η ` ) ≡ e−iηk P̂k , (12)
(TF) [95], JAX [96], and several others. j∈I` k
To be able to train hybrid quantum-classical models
(section III D), the ability to take gradients of quantum where I` ≡ {1, . . . , M` } is an index set for the QNN
neural networks is key. Now that we understand the layers. One can then expand each exponential in the
greater context, let us describe a few techniques below above in a similar fashion to (5):
for the estimation of these gradients. j`
e−iηk P̂k
= cos(ηkj` )Iˆ − i sin(ηkj` )P̂k . (13)
1. Finite difference methods As can be shown from this form, the analytic derivative
of the expectation value f (η) ≡ hΨ0 | Û † (η)Ĥ Û (η) |Ψ0 i
A simple approach is to use simple finite-difference with respect to a component ηkj` can be reduced to fol-
methods, for example, the central difference method, lowing parameter shift rule [47, 89, 100]:
f (θ + ε∆k ) − f (θ − ε∆k ) ∂
j` f (η) = f (η + π4 ∆j` π j`
k ) − f (η − 4 ∆k ) (14)
∂k f (θ) = + O(ε2 ) (10) ∂ηk
2ε
which, in the case where there are M continuous param- where ∆j` k is a vector representing unit-norm perturba-
eters, involves 2M evaluations of the objective function, tion of the variable ηkj` in the positive direction. We thus
each evaluation varying the parameters by ε in some di- see that this shift in parameters can generally be much
rection, thereby giving us an estimate of the gradient larger than that of the numerical differentiation parame-
of the function with a precision O(ε2 ). Here the ∆k ter shifts as in equation (10). In some cases this is useful
is a unit-norm perturbation vector in the k th direction as one does not have to resolve as fine-grained a differ-
of parameter space, (∆k )j = δjk . In general, one may ence in the cost function as an infinitesimal shift, hence
use lower-order methods, such as forward difference with requiring less runs to achieve a sufficiently precise esti-
O(ε) error from M + 1 objective queries [24], or higher mate of the value of the gradient.
order methods, such as a five-point stencil method, with Note that in order to compute through the chain rule
O(ε4 ) error from 4M queries [97]. in (11) for a parametric unitary as in (3), we need to
evaluate the expectation function 2K` times to obtain
the gradient of the parameter θj` . Thus, in some cases
2. Parameter shift methods where each parameter generates an exponential of many
terms, although the gradient estimate is of higher pre-
As recently pointed out in various works [89, 98], given cision, obtaining an analytic gradient can be too costly
knowledge of the form of the ansatz (e.g. as in (3)), one in terms of required queries to the objective function.
can measure the analytic gradients of the expectation To remedy this additional overhead, Harrow et al. [89]
18
proposed to stochastically select terms according to a dis- the gradient component corresponding to the parame-
tribution weighted by the coefficients of each term in the ter θj` . One can go even further in this spirit, for each
generator, and to perform gradient descent from these of these sampled expectation values, by also sampling
stochastic estimates of the gradient. Let us review this terms from (16) according to a similar distribution de-
stochastic gradient estimation technique as it is imple- termined by the magnitude of the Pauli expansion co-
mented in TFQ. efficients. Sampling the indices {q, m} ∼ Pr(q, m) =
PN PJd
|αm γqm |/( d=1 r=1 |αd γrd |) and estimating the expec-
tation hP̂qm iθ for the appropriate parameter-shifted val-
3. Stochastic Parameter Shift Gradient Estimation
ues sampled from the terms of (15) according to the pro-
cedure outlined above. This is considered doubly stochas-
Consider the full analytic gradient from (14) and tic gradient estimation. In principle, one could go one
(11), if we have M` parameters and L layers, there are step further, and per iteration of gradient descent, ran-
PL
`=1 M` terms of the following form to estimate: domly sample indices representing subsets of parame-
ters for which we will estimate the gradient component,
∂f
Kj`
X j` ∂f XhX Kj` i and set the non-sampled indices corresponding gradient
j` j`
= β k = ±β k f (η ± π
4 ∆ k ) . (15) components to 0 for the given iteration. The distribu-
∂θj` ∂ηk
k=1 k=1 ± tion we sample in this case is given by θj` ∼ Pr(j, `) =
PKj` j` PL PMu PKiu iu
These terms come from the components of the gradient k=1 |βk |/( u=1 i=1 o=1 |βo |). This is, in a sense,
vector itself which has the dimension equal to that of the akin to the SPSA algorithm [101], in the sense that
total number of free parameters in the QNN, dim(θ). it is a gradient-based method with a stochastic mask.
For each of these components, for the j th component of The above component sampling, combined with doubly
the `th layer, there 2Kj` parameter-shifted expectation stochastic gradient descent, yields what we consider to be
PL triply stochastic gradient descent. This is equivalent to si-
values to evaluate, thus in total there are `=1 2Kj` M`
multaneously sampling {j, `, k, q, m} ∼ Pr(j, `, k, q, m) =
parameterized expectation values of the cost Hamiltonian
Pr(k|j, `)Pr(j, `)Pr(q, m) using the probabilities outlined
to evaluate.
in the paragraph above, where j and ` index the param-
For practical implementation of this estimation proce-
eter and layer, k is the index from the sum in equation
dure, we must expand this sum further. Recall that, as
(15), and q and m are the indices of the sum in equation
the cost Hamiltonian generally will have many terms, for
(16).
each quantum expectation estimation of the cost function
In TFQ, all three of the stochastic averaging meth-
for some value of the parameters, we have
ods above can be turned on or off independently for
N Jm
N X stochastic parameter-shift gradients. See the details in
the Differentiator module of TFQ on GitHub.
X X
f (θ) = hĤiθ = αm hĥm iθ = αm γqm hP̂qm iθ ,
m=1 m=1 q=1
(16)
PN
which has j=1 Jj terms. Thus, if we consider that all 4. Adjoint Gradient Backpropagation in Simulations
the terms in (15) are of the form of (16), we see that we
PN PL
have a total number of m=1 `=1 2Jm Kj` M` expecta- For experiments with tractably classically simulatable
tion values to estimate the gradient. Note that one of system sizes, the derivatives can be computed entirely
these sums comes from the total number of appearances in simulation, using an analogue of backpropagation
of parameters in front of Paulis in the generators of the called Adjoint Differentiation [102, 103]. This is a high-
parameterized quantum circuit, the second sum comes performance way to obtain gradients of deep circuits with
from the various terms in the cost Hamiltonian in the many parameters. Although it is not possible to perform
Pauli expansion. this technique in quantum hardware, let us outline here
As the cost of accurately estimating all these terms how it is implemented in TFQ for numerical simulator
one by one and subsequently linearly combining the val- backends such as qsim.
ues such as to yield an estimate of the total gradient Suppose we are given a parameterized quantum circuit
may be prohibitively expensive in terms of numbers of in the form of (1)
runs, instead, one can stochastically estimate this sum,
by randomly picking terms according to their weighting L
Y L
Y
[47, 89]. Û (θ) = V̂ ` Û ` (θ ` ) = Ŵθ`` = Ŵθnn Ŵθn−1 1
n−1 · · · Ŵθ 1 ,
One can sample a distribution over the ap- `=1 `=1
pearances of a parameter in the QNN, k ∼
PKj` j`
Pr(k|j, `) = |βkj` |/( o=1 |βo |), one then estimates the where for compactness of notation, we denoted the pa-
two parameter-shifted terms corresponding to this in- rameterized layers in a more succinct form,
dex in (15) and averages over samples. We consider
this case to be simply stochastic gradient estimation for Ŵθ`` = V̂ ` Û ` (θ ` ).
19
Figure 11. Tensor network contraction diagram [71, 72] depicting the adjoint backpropagation process for the computation of
the gradient of an expectation value at the output of multilayer QNN with respect to a parameter in the `th layer. Yellow
channel depicts the forward pass, which is then duplicated into two copies of the final state. The blue and orange channels depict
the backwards pass, performed via layer-wise recursive uncomputation. Both the orange and blue backwards pass outputs are
then contracted with the gradient of the `th layer’s unitary. The arrows with dotted lines indicate how to modify the contraction
to compute a gradient with respect to a parameter in the (` − 1)th layer.
Let us label the initial state |ψ 0 i final quantum state value of this observable with respect to our final layer’s
with
|ψθnn i, parameterized state is
being the recursive definition of the Schrödinger-evolved and the derivative of this expectation value is given by
state vector at layer `. The derivatives of the state with
respect to θ ` , the j th parameter of the `th layer, is given
∂θj` hĤiθn = (∂θj` hψθnn |)Ĥ |ψθnn i + hψθnn | Ĥ(∂θj` |ψθnn i)
by
h i
= 2< hψθnn | Ĥ(∂θj` |ψθnn i) ,
∂θj` |ψθnn i = Ŵθnn · · · Ŵθ`+1 ` `−1 1 0
`+1 ∂θ ` Ŵθ ` Ŵθ `−1 · · · Ŵθ 1 |ψ i .
j
We can leverage this trick in order to evaluate gradients This consists of recursively computing the backwards
of the quantum state with respect to parameters of in- propagated state,
termediate layers of the QNN, `†
|ψθ`−1 `
`−1 i = Ŵθ ` |ψθ ` i
this allows us to compute gradients layer-wise starting contraction of the final state with the observable:
from the final layer via a backwards pass, following of `†
course a forward pass to compute the final state at the |ψ̃θ`−1 `
`−1 i ≡ Ŵθ ` |ψ̃θ ` i , |ψ̃θnn i ≡ Ĥ |ψθnn i .
output layer.
In most contexts, we typically want to take gradients By only storing |ψθ`−1
`−1 i and |ψ̃θ ` i in memory for gradient
`
of expectation values at the output of several QNN lay- evaluations of the ` layer during the backwards pass,
th
ers. Given a Hermitian observable Ĥ, the expectation we saved the need for far more forward passes while only
20
can flatten this array into a vector of dimension RR where work blocks, we simply have to figure out how to back-
propagate gradients through a quantum parameterized
PM
R = k=1 rk . Thus, considering vectors of expectation
values is a relatively general way of representing the out- block function when it is composed with other parame-
put of a quantum neural network. In the limit where the terized block functions. Due to the partial ordering of
set of observables considered forms an informationally- the quantum-classical computational graph, we can fo-
complete set of observables [104], then the array of mea- cus on how to backpropagate gradients through a QNN
surement outcomes would fully characterize the wave- in the scenario where we consider a simplified quantum-
function, albeit at an overhead exponential in the number classical network where we combine all function blocks
of qubits. that precede and postcede the QNN block into mono-
We should mention that in some cases, instead of ex- lithic function blocks. Namely, we can consider a sce-
pectation values or histograms, single samples from the nario where we have fpre : xin 7→ θ (fpre : Rin → RM )
output of a measurement can be used for direct feedback- as the block preceding the QNN, the QNN block as
control on the quantum system, e.g. in quantum error fqnn : θ 7→ hθ , (fqnn : RM → RN ), the post-QNN
correction [85]. At least in the current implementation of block as fpost : hθ 7→ yout (fpost : RM → RN out ) and
TFQuantum, since quantum circuits are built in Cirq and finally the loss function for training the entire network
this feature is not supported in the latter, such scenar- being computed from this output L : RN out → R. The
ios are currently out-of-scope. Mathematically, in such composition of functions from the input data to the out-
a scenario, one could then consider the QNN and mea- put loss is then the sequentially composited function
surement as map from quantum circuit parameters θ to (L◦fpost ◦fqnn ◦fpre ). This scenario is depicted in Fig. 13.
the conditional random variable Λθ valued over RNk cor- Now, let us describe the process of backpropagation
responding to the measured eigenvalues λjk with a prob- through this composition of functions. As is standard in
ability density Pr[(Λθ )k ≡ λjk ] = p(λjk |θ) which cor- backpropagation of gradients through feedforward net-
responds to the measurement statistics distribution in- works, we begin with the loss function evaluated at the
duced by the Born rule, p(λjk |θ) = h|λjk ihλjk |iθ . This output units and work our way back through the sev-
QNN and single measurement map from the parameters eral layers of functional composition of the network to
to the conditional random variable f : θ 7→ Λθ can be get the gradients. The first step is to obtain the gra-
considered as a classical stochastic map (classical condi- dient of the loss function ∂L/∂yout and to use classi-
tional probability distribution over output variables given cal (regular) backpropagation of gradients to obtain the
the parameters). In the case where only expectation val- gradient of the loss with respect to the output of the
ues are used, this stochastic map reduces to a determinis- QNN, i.e. we get ∂(L ◦ fpost )/∂h via the usual use of
tic node through which we may backpropagate gradients, the chain rule for backpropagation, i.e., the contraction
as we will see in the next subsection. In the case where of the Jacobian with the gradient of the subsequent layer,
∂fpost
this map is used dynamically per-sample, this remains a ∂y · ∂h .
∂(L ◦ fpost )/∂h = ∂L
stochastic map, and though there exists some algorithms Now, let us label the evaluated gradient of the loss
for backpropagation through stochastic nodes [105], these function with respect to the QNN’s expectation values
are not currently supported natively in TFQ. as
∂(L◦fpost )
g≡ ∂h . (20)
h=hθ
E. Autodifferentiation through hybrid
quantum-classical backpropagation We can now consider this value as effectively a constant.
Let us then define an effective backpropagated Hamilto-
As described above, hybrid quantum-classical neural nian with respect to this gradient as
network blocks take as input a set of real parameters θ ∈
X
Ĥg ≡ gk ĥk ,
RM , apply a circuit Û (θ) and take a set of expectation k
values of various observables
where gk are the components of (20). Notice that expec-
(hθ )k = hĥk iθ . tations of this Hamiltonian are given by
The following examples show how one may use the var-
ious features of TFQ to reproduce and extend existing
results in second generation quantum machine learning.
Each application has an associated Colab notebook which
can be run in-browser to reproduce any results shown.
Here, we use snippets of code from those example note-
books for illustration; please see the example notebooks
for full code details.
Figure 13. Example of inference and hybrid backpropaga-
tion at the interface of a quantum and classical part of a hy-
brid computational graph. Here we have classical deep neural A. Hybrid Quantum-Classical Convolutional
networks (DNN) both preceding and postceding the quan- Neural Network Classifier
tum neural network (QNN). In this example, the preceding
DNN outputs a set of parameters θ which are used as then To run this example in the browser through Colab,
used by the QNN as parameters for inference. The QNN follow the link:
outputs a vector of expectation values (estimated through
docs/tutorials/qcnn.ipynb
several runs) whose components are (hθ )k = hĥk iθ . This
vector is then fed as input to another (post-ceding) DNN,
and the loss function L is computed from this output. For
backpropagation through this hybrid graph, one first back- 1. Background
propagates the gradient of the loss through the post-ceding
DNN to obtain gk = ∂L/∂hk . Then, one takes the gradi- Supervised classification is a canonical task in classical
ent of the following functional
P of the output of the QNN: machine learning. Similarly, it is also one of the most
fθ = g · hθ = k gk hθ,k = k gk hĥk iθ with respect to the well-studied applications for QNNs [15, 31, 43, 106, 107].
P
QNN parameters θ (which can be achieved with any of the As such, it is a natural starting point for our exploration
methods for taking gradients of QNN’s described in previous of applications for quantum machine learning. Discrimi-
subsections of this section). This completes the backprop-
native machine learning with hierarchical models can be
agation of gradients of the loss function through the QNN,
the preceding DNN can use the now computed ∂L/∂θ to fur- understood as a form of compression to isolate the infor-
ther backpropagate gradients to preceding nodes of the hybrid mation containing the label [108]. In the case of quantum
computational graph. data, the hidden classical parameter (real scalar in the
case of regression, discrete label in the case of classifica-
tion) can be embedded in a non-local subsystem or sub-
space of the quantum system. One then has to perform
some disentangling quantum transformation to extract
Thus, by taking gradients of the expectation value of information from this non-local subspace.
the backpropagated effective Hamiltonian, we can get the To choose an architecture for a neural network model,
gradients of the loss function with respect to QNN pa- one can draw inspiration from the symmetries in the
rameters, thereby successfully backpropagating gradients training data. For example, in computer vision, one of-
through the QNN. Further backpropagation through the ten needs to detect corners and edges regardless of their
preceding function block fpre can be done using standard position in an image; we thus postulate that a neural net-
classical backpropagation by using this evaluated QNN work to detect these features should be invariant under
gradient. translations. In classical deep learning, an example of
such translationally-invariant neural networks are convo-
Note that for a given value of g, the effective back- lutional neural networks. These networks tie parameters
propagated Hamiltonian is simply a fixed Hamiltonian across space, learning a shared set of filters which are
operator, as such, taking gradients of the expectation applied equally to all portions of the data.
of a single multi-term operator can be achieved by any To the best of the authors’ knowledge, there is no
choice in a multitude of the methods for taking gradients strong indication that we should expect a quantum ad-
of QNN’s described earlier in this section. vantage for the classification of classical data using QNNs
in the near term. For this reason, we focus on classifying
Backpropagation through parameterized quantum cir- quantum data as defined in section I C. There are many
cuits is enabled by our Differentiator interface. kinds of quantum data with translational symmetry. One
example of such quantum data are cluster states. These
We offer both finite difference, regular parameter shift, states are important because they are the initial states
and stochastic parameter shift gradients, while the gen- for measurement-based quantum computation [109, 110].
eral interface allows users to define custom gradient In this example we will tackle the problem of detect-
methods. ing errors in the preparation of a simple cluster state.
23
multi_readout_model_circuit (
cluster_state_bits ),
readouts ) ( cluster_state )
QCNN_3 = tfq . layers . PQC (
1
-1 multi_readout_model_circuit (
1
1 cluster_state_bits ),
-1
1 readouts ) ( cluster_state )
Prepare Qconv # Feed all QCNNs into a classical NN
Cluster + concat_out = tf . keras . layers . concatenate (
Test QPool [ QCNN_1 , QCNN_2 , QCNN_3 ])
MSE Loss
dense_1 = tf . keras . layers . Dense (8) ( concat_out )
dense_2 = tf . keras . layers . Dense (1) ( dense_1 )
m u l t i _ q c o n v _ m o d e l = tf . keras . Model (
inputs =[ e x c i t a t i o n _ i n p u t ] ,
outputs =[ dense_2 ])
Figure 16. A simple hybrid architecture in which the outputs We find that that for the same optimization settings, the
of a truncated QCNN are fed into a classical neural network. purely quantum model trains the slowest, while the three-
quantum-filter hybrid model trains the fastest. This data
is shown in Fig. 18. This demonstrates the advantage
of exploring hybrid quantum-classical architectures for
Qconv classifying quantum data.
+ 1
-1
QPool 1
1
-1
1
Prepare Qconv
Cluster + MSE Loss
State QPool
Qconv
+
QPool
We finish off by training on the prepared supervised data We can attempt to learn the noise parameters α and e
in the standard way: by training a recurrent neural network to perform time-
series prediction on the noise. In other words, given a
h i s t o r y _ t w o _ a x i s = model . fit (...
record of expectation values {hψ(ti )| X̂ |ψ(ti )i} for ti ∈
The training converges after around 100 epochs as seen {0, δt, 2δt, . . . , T } obtained on state |ψ(t)i, we want the
in Fig. 20, which also shows excellent generalization to RNN to predict the future observables {hψ(ti )| X̂ |ψ(ti )i}
validation data. for ti ∈ {T, T + δt, T + 2δt, . . . , 2T }.
28
1. Background
the probability that the actual state of the quantum com- 3. Training Models with Noise
puter after your noisy circuit is |ψj i .
Revisiting Fig. 22, if we knew beforehand that 90% of Now that you have implemented some noisy circuit
the time our system executed perfectly, or errored 10% simulations in TFQ, you can experiment with how noise
of the time with just this one mode of failure, then our impacts hybrid quantum-classical models. Since quan-
ensemble would be: tum devices are currently in the NISQ era, it is important
ρ = 0.9 |ψdesired i hψdesired | + 0.1 |ψnoisy i hψnoisy | . to compare model performance in the noisy and noiseless
cases, to ensure performance does not degrade excessively
If there was more than just one way that our circuit could under noise.
error, then the ensemble ρ would contain more than just A good first check to see if a model or algorithm is
two terms (one for each new noisy realization that could robust to noise is to test it under a circuit-wide depolar-
happen). ρ is referred to as the density operator describ- izing model. An example circuit with depolarizing noise
ing your noisy system. applied is shown in Fig. 23. Applying a channel to a
Unfortunately, in practice, it’s nearly impossible to circuit using the circuit.with_noise(channel) method cre-
know all the ways your circuit might error and their exact ates a new circuit with the specified channel applied after
probabilities. A simplifying assumption you can make each operation in the circuit. In the case of a depolariz-
is that after each operation in your circuit a quantum ing channel, one of {X, Y, Z} is applied with probability
channel is applied. Just like a quantum circuit produces p and an identity (keeping the original operation) with
pure states, a quantum channel produces density opera- probability 1 − p.
tors. Cirq has a selection of built-in quantum channels
to model the most common ways a quantum circuit can
go wrong in the real world. For example, here is how you
would write down a quantum circuit with depolarizing
noise:
def x_circuit ( qubits ) :
""" Produces an X wall circuit on ‘ qubits ‘. """
return cirq . Circuit ( cirq . X . on_each (* qubits ) )
noiseless data. This shows that QML models can still Û (η, γ) = e−iηj ĤM e−iγ ĤC
, (30)
succeed at learning despite the presence of mild noise. j=1
In this section, we introduce the basics of Quantum 1. Train a parameterized quantum circuit for a dis-
Approximate Optimization Algorithms and show how to crete optimization problem (MaxCut)
implement a basic case of this class of algorithms in TFQ. 2. Minimize a cost function of a parameterized quan-
In the advanced applications section V, we explore how to tum circuit
apply meta-learning techniques [117] to the optimization
of the parameters of the algorithm.
The Quantum Approximate Optimization Algorithm Required TFQ functionalities:
was originally proposed to solve instances of the Max-
Cut problem [12]. The QAOA framework has since been 1. Conversion of simple circuits to TFQ tensors
extended to encompass multiple problem classes related 2. Evaluation of gradients for quantum circuits
to finding the low-energy states of Ising Hamiltonians, 3. Use of gradient-based optimizers from TF
31
of barren plateaus, see the notebook at the following link: performed until the loss converges or we reach a prede-
fined number of iterations.
docs/tutorials/barren_plateaus.ipynb In the following, we will implement phase one
of the algorithm, and a complete implementation of
both phases can be found in the accompanying notebook.
2. Layerwise quantum circuit learning
ral Network-based approach to learn the Hamiltonian of 2. Training multi-layered quantum neural networks
a quantum dynamical process, given access to quantum with shared parameters
states at various time steps. 3. Batching QNN training data (input-output pairs
As was pointed out in the barren plateaus section V B, and time steps) for supervised learning of quantum
attempting to do QML with no prior on the physics of unitary map
the system or no imposed structure of the ansatz hits the
quantum version of the no free lunch theorem; the net-
work has too high a capacity for the problem at hand and 2. Implementation
is thus hard to train, as evidenced by its vanishing gra-
dients. Here, instead, we use a highly structured ansatz, Please see the tutorial notebook for full code details:
from work featured in [136]. First of all, given that we research/qgrnn_ising/qgrnn_ising.ipynb
know we are trying to replicate quantum dynamics, we
can structure our ansatz to be based on Trotter-Suzuki Here we provide an overview of our implementa-
evolution [86] of a learnable parameterized Hamiltonian. tion. We can define a general Quantum Graph Neu-
This effectively performs a form of parameter-tying in ral Network as a repeating sequence of exponentials
our ansatz between several layers representing time evo- of a Hamiltonian defined on a graph, Ûqgnn (η, θ) =
lution. In a previous example on quantum convolutional
QP QQ −iηpq Ĥq (θ )
p=1 q=1 e where the Ĥq (θ) are generally
networks IV A 1, we performed parameter tying for spa- 2-local Hamiltonians whose coupling topology is that of
tial translation invariance, whereas here, we will assume an assumed graph structure.
the dynamics remain constant through time, and per- In our Hamiltonian learning problem, we aim to learn
form parameter tying across time, hence it is akin to a a target Ĥtarget which will be an Ising model Hamiltonian
quantum form of recurrent neural networks (RNN). More with Jjk as couplings and Bv for site bias term of each
precisely, as it is a parameterization of a Hamiltonian evo- spin, i.e., Ĥtarget = j,k Jjk Ẑj Ẑk + v Bv Ẑv + v X̂v ,
P P P
lution, it is akin to a quantum form of recently proposed given access to pairs of states at different times that were
models in classical machine learning called Hamiltonian subjected to the target time evolution operator Û (T ) =
neural networks [137].
e−iĤtarget T .
Beyond the quantum RNN form, we can impose further
We will use a recurrent form P of QGNN, using Hamil-
structure. We can consider a scenario where we know we
tonian generators Ĥ1 (θ) = v∈V αv X̂v and Ĥ2 (θ) =
have a one-dimensional quantum many-body system. As
v∈V v v , with trainable param-
P P
Hamiltonians of physical have local couplings, we can θ Ẑ
{j,k}∈E jk j k Ẑ + φ Ẑ
use our prior assumptions of locality in the Hamiltonian eters {θjk , φv , αv }, for our choice of graph structure
1
and encode this as a graph-based parameterization of the prior G = {V, E}. The QGRNN is then resembles ap-
Hamiltonian. As we will see below, by using a Quantum plying a Trotterized time evolution of a parameterized
Graph Recurrent Neural network [136] implemented in Ising Hamiltonian Ĥ(θ) = Ĥ1 (θ) + Ĥ2 (θ) where P is the
TFQ, we will be able to learn the effective Hamiltonian number of Trotter steps. This is a good parameteriza-
topology and coupling strengths quite accurately, simply tion to learn the effective Hamiltonian of the black-box
from having access to quantum states at different times dynamics as we know from quantum simulation theory
and employing mini-batched gradient-based training. that Trotterized time evolution operators can closely ap-
Before we proceed, it is worth mentioning that the ap- proximate true dynamics in the limit of |ηjk | → 0 while
proach featured in this section is quite different from the P → ∞.
learning of quantum dynamics using a classical RNN fea- For our TFQ software implementation, we can initial-
ture in previous example section IV B. As sampling the ize Ising model & QGRNN model parameters as random
output of a quantum simulation at different times can be- values on a graph. It is very easy to construct this kind of
come exponentially hard, we can imagine that for large graph structure Hamiltonian by using Python NetworkX
systems, the Quantum RNN dynamics learning approach library.
could have primacy over the classical RNN approach, N = 6
thus potentially demonstrating a quantum advantage of dt = 0.01
QML over classical ML for this problem. # Target Ising model parameters
G_ising = nx . cycle_graph ( N )
ising_w = [ dt * np . random . random () for _ in G .
Target problems: edges ]
ising_b = [ dt * np . random . random () for _ in G .
1. Preparing low-energy states of a quantum system nodes ]
2. Learning Quantum Dynamics using a Quantum
Neural Network Model Because the target Hamiltonian and its nearest-neighbor
graph structure is unknown to the QGRNN, we need to
Required TFQ functionalities:
parameters of the QGRNN. Finally, we can get approximated lowest energy states
# QGRNN model parameters of the VQE model by compiling and training the above
G_qgrnn = nx . r a n d o m _ r e g u l a r _ g r a p h ( n =N , d =4) Keras model.2
qgrnn_w = [ dt ] * len ( G_qgrnn . edges ) model = tf . keras . Model (
qgrnn_b = [ dt ] * len ( G_qgrnn . nodes ) inputs = circuit_input , outputs = output )
theta = [ ’ theta {} ’. format ( e ) for e in G . edges ] adam = tf . keras . optimizers . Adam (
phi = [ ’ phi {} ’. format ( v ) for v in G . nodes ] learning_rate =0.05)
params = theta + phi
Now that we have the graph structure, weights of edges low_bound = - np . sum ( np . abs ( ising_w + ising_b ) )
- N
& nodes, we can construct Cirq-based Hamiltonian oper- inputs = tfq . c o n v e r t _ t o _ t e n s o r ([ circuit ])
ator which can be directly calculated in Cirq and TFQ. outputs = tf . c o n v e r t _ t o _ t e n s o r ([[ low_bound ]])
To create a Hamiltonian by using cirq.PauliSum ’s or model . compile ( optimizer = adam , loss = ’ mse ’)
model . fit ( x = inputs , y = outputs ,
cirq.PauliString ’s, we need to assign appropriate qubits
batch_size =1 , epochs =100)
on them. Let’s assume Hamiltonian() is the Hamilto- params = model . get_weights () [0]
nian preparation function to generate cost Hamiltonian res = { k : v for k , v in zip ( symbols , params ) }
from interaction weights and mixer Hamiltonian from return cirq . r e s o l v e _ p a r a m e t e r s ( circuit , res )
bias terms. We can bring qubits of Ising & QGRNN Now that the VQE function is built, we can generate
models by using cirq.GridQubit . the initial quantum data input with the low energy states
near to the ground state of the target Hamiltonian for
qubits_ising = cirq . GridQubit . rect (1 , N )
qubits_qgrnn = cirq . GridQubit . rect (1 , N , 0 , N ) both our data and our input state to our QGRNN.
ising_cost , ising_mixer = Hamiltonian ( H_target = ising_cost + ising_mixer
G_ising , ising_w , ising_b , qubits_ising ) l o w _ e n e r g y _ i s i n g = VQE ( H_target , qubits_ising )
qgrnn_cost , qgrnn_mixer = Hamiltonian ( l o w _ e n e r g y _ q g r n n = VQE ( H_target , qubits_qgrnn )
G_qgrnn , qgrnn_w , qgrnn_b , qubits_qgrnn )
The QGRNN is fed the same input data as the
To train the QGRNN, we need to create an ensemble true process. We will use gradient-based training over
of states which are to be subjected to the unknown dy- minibatches of randomized timesteps chosen for our
namics. We chose to prepare a low-energy states by first QGRNN and the target quantum evolution. We will
performing a Variational Quantum Eigensolver (VQE) thus need to aggregate the results among the different
[29] optimization to obtain an approximate ground state. timestep evolutions to train the QGRNN model. To cre-
Following this, we can apply different amounts of simu- ate these time evolution exponentials, we can use the
lated time evolutions onto to this state to obtain a varied tfq.util.exponential function to exponentiate our target
dataset. This emulates having a physical system in a low-
and QGRNN Hamiltonians3
energy state and randomly picking the state at different
times. First things first, let us build a VQE model exp_ ising_ cost = tfq . util . exponential (
operators = ising_cost )
def VQE ( H_target , q ) exp_ising_mix = tfq . util . exponential (
# Parameters operators = ising_mixer )
x = [ ’x {} ’. format ( i ) for i , _ in enumerate ( q ) ] exp_ qgrnn_ cost = tfq . util . exponential (
z = [ ’z {} ’. format ( i ) for i , _ in enumerate ( q ) ] operators = qgrnn_cost , coefficients = params )
symbols = x + z exp_qgrnn_mix = tfq . util . exponential (
circuit = cirq . Circuit () operators = qgrnn_mixer )
circuit . append ( cirq . X ( q_ ) ** sympy . Symbol ( x_ )
for q_ , x_ in zip (q , x ) ) Here we randomly pick the 15 timesteps and apply the
circuit . append ( cirq . Z ( q_ ) ** sympy . Symbol ( z_ ) Trotterized time evolution operators using our above con-
for q_ , z_ in zip (q , z ) ) structed exponentials. We can have a quantum dataset
Now that we have a parameterized quantum circuit, {(|ψTj i, |φTj i)|j = 1..M } where M is the number of
we can minimize the expectation value of given Hamil-
tonian. Again, we can construct a Keras model with
Expectation . Because the output expectation values are 2 Here is some tip for training. Setting the output true value to
calculated respectively, we need to sum them up at the theoretical lower bound, we can minimize our expectation value
last. in the Keras model fit P framework. That
P is, we can Puse the in-
equality hĤtarget i = jk Jjk hZj Zk i + v Bv hZv i + v hXv i ≥
circuit_input = tf . keras . Input ( P P
shape =() , dtype = tf . string ) jk (−)|Jjk | − v |Bv | − N .
3 Here, we use the terminology cost and mixer Hamiltonians as the
output = tfq . layers . Expectation () (
circuit_input , Trotterization of an Ising model time evolution is very similar to
symbol_names = symbols , a QAOA, and thus we borrow nomenclature from this analogous
operators = tfq . c o n v e r t _ t o _ t e n s o r ( QNN.
38
PB
data, or batch size (in our case we chose M = 15), L(θ, φ) = 1 − B1 j=1 |hψTj |φTj i|
2
j j
|ψTj i = Ûtarget |ψ0 i and |φTj i = Ûqgrnn |ψ0 i. 1 B
P
= 1− B j=1 hẐtest ij
statistics can exhibit both classical and quantum forms Consider the task of preparing a thermal state: given
of correlations (e.g., entanglement). As such, if we wish a Hamiltonian Ĥ and a target inverse temperature β =
to learn a representation of such mixed state which can 1/T , we want to variationally approximate the state
generatively model its statistics, one can expect that a
hybrid representation combining classical probabilistic σ̂β = 1 −β Ĥ
Zβ e , Zβ = tr(e−β Ĥ ), (35)
models and quantum neural networks can be an ideal.
Such a decomposition is ideal for near-term noisy de- using a state of the form presented in equation (34).
vices, as it reduces the overhead of representation on the That is, we aim to find a value of the hybrid model
quantum device, leading to lower-depth quantum neu- parameters {θ ∗ , φ∗ } such that ρ̂θ∗ φ∗ ≈ σ̂β . In order
ral networks. Furthermore, the quantum layers provide to converge to this approximation via optimization of
a valuable addition in representation power to the clas- the parameters, we need a loss function to optimize
sical probabilistic model, as they allow the addition of which quantifies statistical distance between these quan-
quantum correlations to the model. tum mixed states. If we aim to minimize the discrep-
Thus, in this section, we cover some examples where ancy between states in terms of quantum relative en-
one learns to generatively model mixed states using a tropy D(ρ̂θφ kσ̂β ) = −S(ρ̂θφ ) − tr(ρ̂θφ log σ̂β ), (where
hybrid quantum-probabilistic model [49]. Such models S(ρ̂θφ ) = −tr(ρ̂θφ log ρ̂θφ ) is the entropy), then, as de-
use a parameterized ansatz of the form scribed in the full paper [59] we can equivalently minimize
X the free energy4 , and hence use it as our loss function:
ρ̂θφ = Û (φ)ρ̂θ Û † (φ), ρ̂θ = pθ (x) |xihx| (34)
x
Lfe (θ, φ) = βtr(ρ̂θφ Ĥ) − S(ρ̂θφ ). (36)
where Û (φ) is a unitary quantum neural network with The first term is simply the expectation value of the
parameters φ and pθ (x) is a classical probabilistic model energy of our model, while the second term is the en-
with parameters θ. We call ρ̂θφ the visible state and tropy. Due to the structure of our quantum-probabilistic
ρ̂θ the latent state. Note the latent state is effectively a model, the entropy of the visible state is equal to the
classical distribution over the standard basis states, and entropy of the latent state, which is simply the clas-
its only parameters are those of the classical probabilistic sical
P entropy of the distribution, S(ρ̂θφ ) = S(ρ̂θ ) =
model. − x pθ (x) log pθ (x). This comes in quite useful dur-
As we shall see below, there are methods to train both ing the optimization of our model.
networks simultaneously. In terms of software implemen- Let us implement a simple example of the VQT model
tation, as we have to combine probabilistic models and which minimizes free energy to achieve an approximation
quantum neural networks, we will use a combination of of the thermal state of a physical system. Let us consider
TensorFlow Probability [140] along with TFQ. A first a two-dimensional Heisenberg spin chain
class of application we will consider is the task of gen- Ĥheis =
X
Jh Ŝi · Ŝj +
X
Jv Ŝi · Ŝj (37)
erating a thermal state of a quantum system given its
hijih hijiv
Hamiltonian. A second set of applications is given sev-
eral copies of a mixed state, learn a generative model where h (v) denote horizontal (vertical) bonds, while h·i
which replicates the statistics of the state. represent nearest-neighbor pairings. First, we define this
Hamiltonian on a grid of qubits:
Target problems: def get_bond ( q0 , q1 ) :
1. Incorporating probabilistic and quantum models return cirq . PauliSum . f r o m _ p a u l i _ s t r i n g s ([
cirq . PauliString ( cirq . X ( q0 ) , cirq . X ( q1 ) ) ,
2. Variational Quantum Simulation of Quantum cirq . PauliString ( cirq . Y ( q0 ) , cirq . Y ( q1 ) ) ,
Thermal States cirq . PauliString ( cirq . Z ( q0 ) , cirq . Z ( q1 ) ) ])
3. Learning to generatively model mixed states from
data def g e t _ h e i s e n b e r g _ h a m i l t o n i a n ( qubits , jh , jv ) :
heisenberg = cirq . PauliSum ()
Required TFQ functionalities: # Apply horizontal bonds
for r in qubits :
1. Integration with TF Probability [140] for q0 , q1 in zip (r , r [1::]) :
2. Sample-based simulation of quantum circuits heisenberg += jh * get_bond ( q0 , q1 )
3. Parameter shift differentiator for gradient compu- # Apply vertical bonds
for r0 , r1 in zip ( qubits , qubits [1::]) :
tation for q0 , q1 in zip ( r0 , r1 ) :
heisenberg += jv * get_bond ( q0 , q1 )
return heisenberg
2. Variational Quantum Thermalizer
Full notebook of the implementations below are avail- 4 More precisely, the loss function here is in fact the inverse tem-
able at: perature multiplied by the free energy, but this detail is of little
research/vqt_qmhl/vqt_qmhl.ipynb import to our optimization.
40
For our QNN, we consider a unitary consisting of general estimate via sampling of the classical probabilistic model
single qubit rotations and powers of controlled-not gates. and the output of the QPU.
Our code returns the associated symbols so that these For our classical latent probability distribution pθ (x),
can be fed into the Expectation op: as a first simple case, we can use the product of inde-
pendent Bernoulli distributions pθ (x) = j pθj (xj ) =
Q
Q xj
def g et _r o ta t io n_ 1 q (q , a , b , c ) : j θj (1 − θj )
1−xj
, where xj ∈ {0, 1} are binary values.
return cirq . Circuit ( We can re-phrase this distribution as an energy based
cirq . X ( q ) ** a , cirq . Y ( q ) ** b , cirq . Z ( q ) ** c ) model to take advantage of equation (V D 2). We move
the parameters into an exponential,Qso that the probabil-
def g et _r o ta t io n_ 2 q ( q0 , q1 , a ) :
return cirq . Circuit ( ity of a bitstring becomes pθ (x) = j eθj xj /(eθj + e−θj ).
cirq . CNotPowGate ( exponent = a ) ( q0 , q1 ) ) Since this distribution is a product of independent vari-
ables, it is easy to sample from. We can use the Tensor-
def get_layer_1q ( qubits , layer_num , L_name ) : Flow Probability library [140] to produce samples from
layer_symbols = []
circuit = cirq . Circuit ()
this distribution, using the tfp.distributions.Bernoulli
for n , q in enumerate ( qubits ) : object:
a , b , c = sympy . symbols ( def b e r n o u l l i _ b i t _ p r o b a b i l i t y ( b ) :
" a {2} _ {0} _ {1} b {2} _ {0} _ {1} c {2} _ {0} _ {1} " return np . exp ( b ) /( np . exp ( b ) + np . exp ( - b ) )
. format ( layer_num , n , L_name ) )
layer_symbols += [a , b , c ] def s a m p l e _ b e r n o u l l i ( num_samples , biases ) :
circuit += ge t_ r ot at i on _1 q (q , a , b , c ) prob_list = []
return circuit , layer_symbols for bias in biases . numpy () :
prob_list . append (
def get_layer_2q ( qubits , layer_num , L_name ) : b e r n o u l l i _ b i t _ p r o b a b i l i t y ( bias ) )
layer_symbols = [] latent_dist = tfp . distributions . Bernoulli (
circuit = cirq . Circuit () probs = prob_list , dtype = tf . float32 )
for n , ( q0 , q1 ) in enumerate ( zip ( qubits [::2] , return latent_dist . sample ( num_samples )
qubits [1::2]) ) :
a = sympy . symbols ( " a {2} _ {0} _ {1} " . format ( After getting samples from our classical probabilistic
layer_num , n , L_name ) ) model, we take gradients of our QNN parameters. Be-
layer_symbols += [ a ]
circuit += ge t_ r ot at i on _2 q ( q0 , q1 , a )
cause TFQ implements gradients for its expectation ops,
return circuit , layer_symbols we can use tf.GradientTape to obtain these derivatives.
Note that below we used tf.tile to give our Hamiltonian
It will be convenient to consider a particular class of
operator and visible state circuit the correct dimensions:
probabilistic models where the estimation of the gradi-
ent of the model parameters is straightforward to per- bitstring_tensor = sample_bernoulli (
num_samples , vqt_biases )
form. This class of models are called exponential families with tf . GradientTape () as tape :
or energy-based models (EBMs). If our parameterized t i l e d _ v q t _ m o d e l _ p a r a m s = tf . tile (
probabilistic model is an EBM, then it is of the form: [ v q t _ m o d e l _ p a r a m s ] , [ num_samples , 1])
s a m p l e d _ e x p e c t a t i o n s = expectation (
pθ (x) = Z1θ e−Eθ (x) , Zθ ≡ x∈Ω e−Eθ (x) . (38)
P
tiled_visible_state ,
vqt_symbol_names ,
For gradients of the VQT free energy loss function tf . concat ([ bitstring_tensor ,
t i l e d _ v q t _ m o d e l _ p a r a m s ] , 1) ,
with respect to the QNN parameters, ∂φ Lfe (θ, φ) = tiled_H )
β∂φ tr(ρ̂θφ Ĥ), this is simply the gradient of an expec- energy_losses = beta * s a m p l e d _ e x p e c t a t i o n s
tation value, hence we can use TFQ parameter shift gra- e n e r g y _ l o s s e s _ a v g = tf . reduce_mean (
dients or any other method for estimating the gradients energy_losses )
v q t _ m o d e l _ g r a d i e n t s = tape . gradient (
of QNN’s outlined in previous sections. energy_losses_avg , [ v q t _ m o d e l _ p a r a m s ])
As for gradients of the classical probabilistic model,
one can readily derive that they are given by the following Putting these pieces together, we train our model to out-
covariance: put thermal states of the 2D Heisenberg model on a 2x2
grid. The result after 100 epochs is shown in Fig. 28.
A great advantage of this approach to optimization of
h i
∂θ Lfe = Ex∼pθ (x) (Eθ (x) − βHφ (x))∇θ Eθ (x)
the probabilistic model is that the partition function Zθ
−(Ex∼pθ (x) Eθ (x)−βHφ (x) )(Ey∼pθ (y) ∇θ Eθ (y) ), does not need to be estimated. As such, more general
(39) more expressive models beyond factorized distributions
can be used for the probabilistic modelling of the la-
where Hφ (x) ≡ hx| Û † (φ)Ĥ Û (φ) |xi is the expectation tent classical distribution. In the advanced section of the
value of the Hamiltonian at the output of the QNN with notebook, we show how to use a Boltzmann machine as
the standard basis element |xi as input. Since the energy our energy based model. Boltzmann machines are EBM’s
function and its gradients can easily be evaluated as it is where for Pbitstring x ∈ {0,P1} , the energy is defined as
n
It is worthy to note that our factorized Bernoulli distri- where σφ (x) ≡ hx| Û † (φ)σ̂D Û (φ) |xi is the distribution
bution is in fact a special case of the Boltzmann machine, obtained by feeding the data state σ̂D through the inverse
one where only the so-called bias P terms in the energy QNN circuit Û † (φ) and measuring in the standard basis.
function are present: E(x) = − i bi xi . In the notebook, As this is simply an expectation value of a state prop-
we start with this simpler Bernoulli example of the VQT, agated through a QNN, for gradients of the loss with re-
the resulting density matrix converges to the known ex- spect to QNN parameters we can use standard TFQ dif-
act result for this system, as shown in Fig. 28. We also ferentiators, such as the parameter shift rule presented in
provide a more advanced example with a general Boltz- section III. As for the gradient with respect to the EBM
mann machine. In the latter example, we picked a fully parameters, it is given by
visible, fully-connected classical Ising model energy func-
tion, and used MCMC with Metropolis-Hastings [141] to
sample from the energy function. ∂θ Lxe (θ, φ) = Ex∼σφ (x) [∇θ Eθ (x)]−Ey∼pθ (y) [∇θ Eθ (y)].
ers and custom objectives enables straightforward imple- for real scalars a` and Pauli strings P` . The Hamiltonian
mentation of and experimentation with any VQE based is sparse in this representation; typically L = O(N 4 )
algorithm. Additionally, OpenFermion [151] has native in the number of spin-orbitals (or equivalently, qubits)
methods for working with Cirq, enabling the quantum N . We create custom VQE layers with different readout
chemistry methods OpenFermion provides to be easily operators but shared parameters. The SSVQE is then
combined with the quantum machine learning methods implemented as a collection of these VQE layers, with
Tensorflow-Quantum provides. each input being orthogonal. This is implemented by
Target problems: applying a Pauli X gate prior to creating U (θ).
1. Implement SSVQE [145] in Tensorflow-Quantum class VQE ( tf . keras . layers . Layer ) :
2. Find the ground state and first excited state ener- def __init__ ( self , circuits , ops ) :
gies for H2 at increasing bond lengths super ( VQE , self ) . __init__ ()
43
self . layers = [ tfq . layers . ControlledPQC ( already been generated. These files were generated using
circuits [ i ] , ops [ i ] , dif ferent iator = tfq . OpenFermion [151] and PySCF [152]. Using the Open-
d if fe r en ti a to r s . Adjoint () ) for i in range ( FermionPySCF function generate_molecular_hamiltonian
len ( circuits ) ) ]
to generate the Hamiltonians for each bond length,
def call ( self , inputs ) : we then converted this to a form compatible with
return sum ([ self . layers [ i ]([ inputs [0] , quantum circuits using the Jordan-Wigner Transform,
inputs [1]]) for i in range ( len ( self . layers ) ) which maps fermionic annihilation operators to qubits
])
via: ap 7→ 21 (Xp + iYp ) Z1 · · · Zp−1 and a†p 7→
2 (Xp − iYp ) Z1 · · · Zp−1 . This yields a Hamiltonian of
1
class SSVQE ( tf . keras . layers . Layer ) :
def __init__ ( self , num_weights , circuits ,
the form: Ĥ = g0 1 + g1 X0 X1 Y2 Y3 + g2 X0 Y1 Y2 X3 +
ops , k , const ) :
super ( SSVQE , self ) . __init__ () g3 Y0 X1 X2 Y3 + g4 Y0 Y1 X2 X3 + g5 Z0 + g6 Z0 Z1 + g7 Z0 Z2 +
self . theta = tf . Variable ( np . random . g8 Z0 Z3 + g9 Z1 + g10 Z1 Z2 + g11 Z1 Z3 + g12 Z2 + g13 Z2 Z3 +
uniform (0 , 2 * np . pi , (1 , num_weights ) ) , g13 Z3 , where gn is determined by the bond length. It
dtype = tf . float32 ) is this PauliSum object which is then saved and loaded.
self . hamiltonians = []
self . k = k
With the Hamiltonians created, we iterate over the bond
self . const = const lengths and compute the predicted energies of the states.
for i in range ( k ) :
self . hamiltonians . append ( VQE (
circuits [ i ] , ops [ i ]) )
energies . theta = ssvqe_model . This implementation takes several minutes to run, even
t r a i n a b l e _ v a r i a b l e s [0] with the Adjoint differentiator. This is because there are
energies = [ i . numpy () [0][0] for i in 4 layers * 3 parameters per qubit * 4 qubits = 48 pa-
energies ( inputs ) [1]] rameters per layer and because we must optimize these
return energies [0] , energies [1]
parameters for 20 different Hamiltonians. We can then
With the SSVQE set up, we now need to generate the plot and compare the actual energies with the VQE pre-
inputs. In the code below, we load the data that has dicted energies, as done in Figure 29.
44
X N
X
Figure 29. Comparison of SSVQE predicted energy for the HTFIM = − σiz σjz − g σix = Hzz + gHx ,
ground state and first excited state compared with the true hi,ji i
values
where hi, ji indicates the set of nearest neighbor indices
for each point in the lattice. On the torus, this system
has a phase transition at g ≈ 3.04 from an ordered to a
disordered phase [162, 163].
F. Classification of Quantum Many-Body Phase After the state preparation circuit, we use the hard-
Transitions ware efficient ansatz to train the quantum classifier. This
ansatz consists of two layers of parameterized single qubit
gates and a single layer of two-qubit parameterized gates
To run this example see the Colab notebook at: [146]. Note that the parameters in the state preparation
research/phase_classifier/phase_classification.ipynb circuit stay fixed during training. Finally, we measure
the observables hZi i on each qubit and apply a rescaled
sigmoid to the linear predictor of the outcomes,
X
1. Background ȳ = tanh hZi iWi ,
i
In quantum many-body systems we are interested in
studying different phases of matter and the transitions where Wi ∈ Rn is a weight vector. Hence, given an input
between these different phases. This requires a detailed ground state |ψ0 (θk∗ )i, our classifier will output a single
characterization of the observables that change as a func- scalar ȳ ∈ (−1, 1).
tion of the order values of the system. To estimate We use the Hinge loss function to train the model to
these observables, we have to rely on quantum Monte output the correct labels. This loss function is given by
Carlo techniques or tensor network approaches, that each ` = max(0, t − y · ȳ) (40)
come with their respective advantages and drawbacks
[153, 154]. where, y ∈ {−1, 1} are the ground truth labels and
Recently, work on classifying phases with supervised ȳ ∈ (−1, 1) the model predictions. The entire end-to-
machine learning has been proposed as a method for un- end model contains several non-trivial components, such
derstanding phase transitions [155, 156]. For such a su- as loading quantum states as data or backpropagating
pervised approach, we label the different phases of the classical gradients through quantum gates.
system and use classical data obtained from the system
of interest and train a machine learning model to predict Target problems:
the labels. A natural extension of these ideas is a quan-
tum machine learning model for phase classification [157], 1. Training a quantum circuit classifier to detect a
where the information that characterizes the state is en- phase transition in a condensed matter physics sys-
coded in a variational quantum circuit. This would re- tem
quire preparing high fidelity quantum many-body states Required TFQ functionalities:
on NISQ devices, an area of active research in the field
of quantum computing [158–161]. In this section, we 1. Hybrid quantum-classical optimization of a varia-
show how a quantum classifier can be trained end-to-end tional quantum circuit and logistic function.
in Tensorflow quantum by using a data set of quantum 2. Batching training over quantum data (multiple
many-body states. ground states and labels)
45
the critical point, and then see how well the classifier that the EQ-GAN converges on problem instances that
generalizes to states outside of the seen data. the QuGAN failed on [76].
A pervasive issue in near-term quantum computing is
noise: when performing an entangling operation, the re-
quired two-qubit gate introduces phase errors that are
0.75 difficult to fully calibrate against due to a time-dependent
noise model. However, such operations are required to
0.50 measure the overlap between two quantum states, in-
cluding the assessment of how close the fake data is to
0.25
Output Layer
The entangling quantum generative adversarial net- As an example, we consider the task of learning the su-
work (EQ-GAN) uses a minimax cost function perposition √12 (|0i + |1i) on a quantum device with noise
(Fig. 32). The discriminator is defined by a swap test
min max V (θg , θd ) = min max[1 − Dσ (θd , ρ(θg ))] (41) with CZ gate providing the necessary two-qubit opera-
θg θd θg θd
tion. To learn to correct gate errors, however, the dis-
to learn the true data density matrix σ. The generator criminator adversarially learns the angles of single-qubit
produces a fake quantum state ρ(θg ), while the discrimi- Z rotations insert directly after the CZ gate. Hence, the
nator performs a parameterized swap test (fidelity mea- EQ-GAN obtains a state overlap significantly better than
surement) Dσ (θd , ρ(θg )) between the true data σ and the that of the perfect swap test.
fake data ρ. The swap test is parameterized such that
there exist parameters θdopt that realize a perfect swap
𝜌(𝜃g) U (𝜃d)
{
{
test, i.e. Dσ (θdopt , ρ(θg )) = 12 + 12 Dσfid (ρ(θg )) where
q 2 |0⟩ Z 𝜃 X 𝜃 Z 𝜃 H
1 2 3
Z𝜃 H
4
{
In the presence of noise, it is generally difficult to de-
termine a circuit ansatz for Dσ (θd , ρ) such that param- 𝜎
eters θdopt exist. When implementing a CZ gate, gate
parameters such as the conditional Z phase, single qubit Figure 32. EQ-GAN experiment for learning a single-qubit
Z phase and swap angles in two-qubit entangling gate state. The discriminator (U (θd ) is constructed with free Z
can drift and oscillate over the time scale of O(10) min- rotation angles to suppress CZ gate errors, allowing the gen-
erator ρ(θg ) to converge closer to the true data state σ by
utes [167, 168]. Such unknown systematic and time-
varying X and Z rotation angles.
dependent coherent errors provides significant challenges
for applications in quantum machine learning where gra-
dient computation and update requires many measure-
ments, especially because of the use of entangling gates While the noise suppression experiment can be evalu-
in a swap test circuit. However, the large deviations in ated with a simulated noise model as described above, the
single-qubit and two-qubit Z rotation angles can largely full model can also be run by straightforwardly changing
be mitigated by including additional single-qubit Z phase the backend via the Google Quantum Engine API.
compensations. In learning the discriminator circuit that engine = cirq . google . Engine ( project_id =
is closest to a true swap test, the adversarial learning project_id )
of EQ-GAN provides a useful paradigm that may be backend = engine . sampler ( processor_id =[ ’ rainbow ’
broadly applicable to improving the fidelity of other near- ] , gate_set = cirq . google . XMON )
term quantum algorithms. The hardware backend can then be directly applied in
To operate under this noise model, we can define a a model.
cirq.NoiseModel that is compatible with TFQ, imple-
menting the noisy_operation method to add CZ and Z e x p e c t a t i o n _ o u t p u t = tfq . layers . Expectation (
backend = backend ) (
phase errors on all CZ gates. full_swaptest ,
def n oi sy _ op e ra ti o n ( self , op ) : symbol_names = generator_symbols ,
if isinstance ( op . gate , cirq . ops . CZPowGate ) : operators = circuits . sw a p_ re a do ut _ op (
error_2q = cirq . ops . CZPowGate ( exponent = np . generator_qubits , data_qubits ) ,
random . normal ( self . mean [0] , self . stdev [0]) ) initializer = tf . c o n s t a n t _ i n i t i a l i z e r (
(* op . qubits ) generator_initialiation ))
error_1q_0 = cirq . ops . ZPowGate ( exponent = self
. single_errors [ op . qubits [0]]) ( op . qubits [0]) When evaluating the adversarial swap test on a cali-
error_1q_1 = cirq . ops . ZPowGate ( exponent = self brated quantum device, we found that that the error was
. single_errors [ op . qubits [1]]) ( op . qubits [1]) reduced by a factor of around four compared to a direct
return [ op , error_2q , error_1q_0 , error_1q_1 implementation of the standard swap test [76].
]
return op
To add the noise to TFQ, we can simply convert the 3. Approximate Quantum Random Access Memory
circuit with noise into a tensor. For instance, to generate
swap tests over n_data datapoints:
To run this example, see the Colab notebook at:
s w a p _ t e s t _ c i r c u i t = s w a p _ t e s t _ c i r c u i t . with_noise
( noise_model ) research/eq_gan/variational_qram.ipynb
s w a p _ t e s t _ c i r c u i t = tf . tile ( tfq .
c o n v e r t _ t o _ t e n s o r ([ s w a p _ t e s t _ c i r c u i t ]) , tf . When applying quantum machine learning to classi-
constant ([ n_data ]) ) cal data, most algorithms require access to a quantum
48
random access memory (QRAM) that stores the classi- H. Reinforcement Learning with Parameterized
cal dataset in superposition. More particularly, a set of Quantum Circuits
classical data can be described by the empirical distribu-
tion {Pi } over all possible input data i. Most quantum 1. Background
machine learning algorithms Prequire
√ the conversion from
{Pi } into a quantum state i Pi |ψi i, i.e. a superposi- Reinforcement learning (RL) [170] is one of the three
tion of orthogonal basis states |ψi i representing each sin- main paradigms of machine learning. It pertains to a
gle classical data entry with an amplitude proportional learning setting where an agent interacts with an en-
to the square root of the classical probability Pi . Prepar- vironment as to solve a sequential decision task set by
ing such a superposition of an arbitrary set of n states this environment, e.g., playing Atari games [171] or nav-
takes O(n) operations at best, which ruins the exponen- igating cities without a map [172]. This interaction is
tial speedup. Given a suitable ansatz, we may use an commonly divided into episodes (s0 , a0 , r0 , s1 , a1 , r1 , . . .)
EQ-GAN to learn a state approximately equivalent to of states, actions and rewards exchanged between agent
the superposition of data. and environment. For each state s the agent perceives, it
To demonstrate a variational QRAM, we consider a performs an action a sampled from its policy π(a|s), i.e.,
dataset of two peaks sampled from different Gaussian a probability distribution over actions given states. The
distributions. Exactly encoding the empirical probability environment then rewards the agent’s action with a real-
density function requires a very deep circuit and multiple- valued reward r and updates the state of the agent. This
control rotations; similarly, preparing a Gaussian dis- interaction repeats cyclically until episode termination.
tribution on a device with planar connectivity requires The goal of the agent here is to optimize its policy π(a|s)
deep circuits. Hence, we select shallow circuit ansatzes as to maximize its expected rewards in an episode, and
that generate concatenated exponential functions to ap- hence solve the task at hand. More precisely, the agent’s
proximate a symmetric peak [169]. Once trained to ap- figure of merit is defined by a so-called value function
proximate the empirical data distribution, the variational
QRAM closely reproduces the original dataset (Fig. 33).
"H #
X
Vπ (s0 ) = Eπ t
γ rt (43)
t=0
most crucial of these choices is arguably the data encod- 2. Train a reinforcement learning agent based on this
ing strategy. As supported by theoretical investigations quantum circuit
[182, 183], data re-uploading (see Fig. 34) where encod-
ing layers and variational layers of parametrized gates are Required TFQ functionalities:
sequentially alternated, stands out as a way to get highly
expressive models. 1. Parametrized circuit layers
2. Custom Keras layers
3. Automatic differentiation w.r.t. a custom loss using
tf.GradientTape
3. Implementation
1. Implement a parametrized quantum circuit with return circuit , list ( params . flat ) , list ( inputs
data re-uploading . flat )
50
We use this quantum circuit to define a ControlledPQC def __init__ ( self , output_dim ) :
layer. To sort between variational and encoding super ( Alternating , self ) . __init__ ()
angles in the data re-uploading scheme, we include self . w = tf . Variable ( initial_value = tf .
constant ([[( -1.) ** i for i in range (
the ControlledPQC in a custom Keras layer. This output_dim ) ]]) , dtype = " float32 " , trainable =
ReUploadingPQC layer will manage the trainable param- True , name = " obs - weights " )
eters (variational angles θ and input-scaling parameters
def call ( self , inputs ) :
λ) and resolve the input values (input state s) into the return tf . matmul ( inputs , self . w )
appropriate symbols in the circuit.
ops = [ cirq . Z ( q ) for q in qubits ]
class ReU ploadi ngPQC ( tf . keras . layers . Layer ) :
observables = [ reduce (( lambda x , y : x * y ) , ops ) ]
def __init__ ( self , qubits , n_layers ,
observables , activation = ’ linear ’ , name = " re - We now put together all these layers to define a Keras
uploading_PQC " ) : model of the policy in equation (44).
super ( ReUploading , self ) . __init__ ( name = name )
self . n_layers = n_layers def g e n e r a t e _ m o d e l _ p o l i c y ( qubits , n_layers ,
n_actions , beta , observables ) :
circuit , theta_symbols , input_symbols = input_tensor = tf . keras . Input ( shape =( len (
g e n e r a t e _ c i r c u i t ( qubits , n_layers ) qubits ) , ) , dtype = tf . dtypes . float32 , name = ’
self . c o m p u t a t i o n _ l a y e r = tfq . layers . input ’)
ControlledPQC ( circuit , observables ) re_uploading = ReUploading ( qubits , n_layers ,
observables ) ([ input_tensor ])
theta_init = tf . r a n d o m _ u n i f o r m _ i n i t i a l i z e r ( process = tf . keras . Sequential ([
minval =0. , maxval = np . pi ) Alternating ( n_actions ) ,
self . theta = tf . Variable ( initial_value = tf . keras . layers . Lambda ( lambda x : x * beta ) ,
theta_init ( shape =(1 , len ( theta_symbols ) ) , tf . keras . layers . Softmax ()
dtype = " float32 " ) , trainable = True , name = " ] , name = " observables - policy " )
thetas " ) policy = process ( re_uploading )
lmbd_init = tf . ones ( shape =( len ( qubits ) * model = tf . keras . Model ( inputs =[ input_tensor ] ,
n_layers ,) ) outputs = policy )
self . lmbd = tf . Variable ( initial_value = return model
lmbd_init , dtype = " float32 " , trainable = True , We now move on to the implementation of the learn-
name = " lambdas " )
symbols = [ str ( x ) for x in theta_symbols + ing algorithm. We start by defining two helper func-
input_symbols ] tions: a gather_episodes function that gathers a batch
self . indices = tf . constant ([ sorted ( symbols ) . of episodes of interaction with the environment, and a
index ( a ) for a in symbols ]) compute_returns function that computes the discounted
self . activation = activation
self . empty_circuit = tfq . c o n v e r t _ t o _ t e n s o r ([ sums of rewards appearing in the agent’s loss function.
cirq . Circuit () ]) def g at he r _e pi s od e s ( state_bounds , n_actions ,
model , n_episodes , env_name ) :
def call ( self , inputs ) : trajectories = [ defaultdict ( list ) for _ in
batch_dim = tf . gather ( tf . shape ( inputs [0]) , range ( n_episodes ) ]
0) envs = [ gym . make ( env_name ) for _ in range (
t i l e d _ u p _ c i r c u i t s = tf . repeat ( self . n_episodes ) ]
empty_circuit , repeats = batch_dim ) done = [ False for _ in range ( n_episodes ) ]
t il ed _ up _t h et a s = tf . tile ( self . theta , states = [ e . reset () for e in envs ]
multiples =[ batch_dim , 1])
t il ed _ up _i n pu t s = tf . tile ( inputs [0] , while not all ( done ) :
multiples =[1 , self . n_layers ]) unfi nished _ids = [ i for i in range (
scaled_inputs = tf . einsum ( "i , ji - > ji " , self . n_episodes ) if not done [ i ]]
lmbd , t i le d_ u p_ in p ut s ) n o r m a l i z e d _ s t a t e s = [ s / state_bounds for i , s
s qu as h ed _i n pu t s = tf . keras . layers . Activation in enumerate ( states ) if not done [ i ]]
( self . activation ) ( scaled_inputs ) for i , state in zip ( unfinished_ids ,
joined_vars = tf . concat ([ tiled_up_thetas , normalized_states ):
s qu as h ed _i n pu t s ] , axis =1) trajectories [ i ][ ’ states ’ ]. append ( state )
joined_vars = tf . gather ( joined_vars , self .
indices , axis =1) # Compute policy for all unfinished envs in
parallel
return self . c o m p u t a t i o n _ l a y e r ([ states = tf . c o n v e r t _ t o _ t e n s o r (
tiled_up_circuits , joined_vars ]) normalized_states )
action_probs = model ([ states ])
We also implement a custom Keras layer to post-
process the expectation values hOa is,θ,λ at the output of # Store action and transition all
environments to the next state
the PQC. Here, the Alternating layer multiplies a single states = [ None for i in range ( n_episodes ) ]
hZ0 Z1 Z2 Z3 is,θ,λ expectation value by weights (w0 , w1 ) for i , policy in zip ( unfinished_ids ,
initialized to (1, −1). action_probs . numpy () ) :
action = np . random . choice ( n_actions , p =
class Alternating ( tf . keras . layers . Layer ) : policy )
51
To train the policy, we need a function that updates 4. Value-Based Reinforcement Learning with PQCs
the model parameters. This is done via gradient descent
on the loss of the agent. Since the loss function in policy- The example above provides the code for a policy-
gradient approaches is more involved than, for instance, based approach to RL with PQCs. The tutorial under
a supervised learning loss, we use a tf.GradientTape to this link:
store the contributions of our model evaluation to the
loss. When all contributions have been added, this tape docs/tutorials/quantum_reinforcement_learning.ipynb
can then be used to backpropagate the loss on all the also implements the value-based algorithm introduced in
model evaluations and hence compute the required gra- [179]. Aside from the learning mechanisms specific to
dients. deep value-based RL (e.g., a replay memory to re-use
@tf . function past experience and a target model to stabilize learning),
def r e i n f o r c e _ u p d a t e ( states , actions , returns , these two methods essentially differ in the role played
model ) : by the expectation values hOa is,θ,λ of the PQC. In our
states = tf . c o n v e r t _ t o _ t e n s o r ( states )
actions = tf . c o n v e r t _ t o _ t e n s o r ( actions ) quantum approach to value-based RL, these correspond
returns = tf . c o n v e r t _ t o _ t e n s o r ( returns ) to approximations of the values Q(s, a) (i.e., the value
function V (s) taken as a function of a state and an ac-
with tf . GradientTape () as tape : tion), trained using a loss function of the form:
tape . watch ( model . t r a i n a b l e _ v a r i a b l e s )
logits = model ( states ) 2
1 X
p_actions = tf . gather_nd ( logits , actions ) L(θ) = Qθ (s, a) − [r + max Qθ0 (s0 , a0 )]
log_probs = tf . math . log ( p_actions ) |B| 0a
s,a,r,s0 ∈B
loss = tf . math . reduce_sum ( - log_probs *
returns ) / batch_size
grads = tape . gradient ( loss , model . derived from Q-learning [170]. We refer to [178, 179] and
trainable_variables ) the tutorial for more details.
for optimizer , w in zip ([ optimizer_in ,
optimizer_var , optimizer_out ] , [ w_in , w_var ,
w_out ]) :
optimizer . a p pl y _g ra d ie nt s ([( grads [ w ] , model . 5. Quantum Environments
t r a i n a b l e _ v a r i a b l e s [ w ]) ])
With this, we can implement the main training loop of In the initial works, it was proven that there exist
the agent. learning environments that can only be efficiently tackled
by quantum agents (barring well-established computa-
env_name = " CartPole - v1 " tional assumptions) [176]. However, these problems are
for batch in range ( n_episodes // batch_size ) :
artificial and contrived, and it is a key question in the
# Gather episodes field of Quantum RL whether there exist natural environ-
episodes = g at h er _e p is od e s ( state_bounds , ments for which quantum agents can have a large learn-
n_actions , model , batch_size , env_name ) ing advantage over their classical counterparts. Intu-
itive candidates are RL environments that are themselves
# Group states , actions and returns in arrays
states = np . concatenate ([ ep [ ’ states ’] for ep
quantum in nature. In this setting, early results have
in episodes ]) already demonstrated that variational quantum methods
actions = np . concatenate ([ ep [ ’ actions ’] for ep can be advantageous when the data perceived by a learn-
in episodes ]) ing agent stems from measurements on a quantum sys-
rewards = [ ep [ ’ rewards ’] for ep in episodes ] tem [176]. Going a step further, one could also consider a
returns = np . concatenate ([ c o mp ut e _r et u rn s (
ep_rwds , gamma ) for ep_rwds in rewards ]) setting where the states perceived by the agent are gen-
returns = np . array ( returns , dtype = np . float32 ) uinely quantum and can therefore be processed directly
i d_ ac t io n _p ai r s = np . array ([[ i , a ] for i , a in by a PQC, as was recently explored in a quantum control
enumerate ( actions ) ]) environment [185].
52
VI. CLOSING REMARKS Platt, and Nicholas Rubin. The authors would like to
also thank Achim Kempf from the University of Water-
The rapid development of quantum hardware repre- loo for sponsoring this project. M.B. and J.Y. would like
sents an impetus for the equally rapid development of to thank the Google Brain and Core team for support-
quantum applications. In October 2017, the Google AI ing this project, in particular Christian Howard, Billy
Quantum team and collaborators released its first soft- Lamberta, Tom O’Malley, Francois Chollet, Yifei Feng,
ware library, OpenFermion, to accelerate the develop- David Rim, Justin Hong, and Megan Kacholia. G.V.,
ment of quantum simulation algorithms for chemistry A.J.M. and J.Y. would like to thank Stefan Leichenauer,
and materials sciences. Likewise, TensorFlow Quantum Jack Hidary and the rest of the Sandbox@Alphabet team
is intended to accelerate the development of quantum for their support during their respective Quantum Resi-
machine learning algorithms for a wide array of applica- dencies. G.V. acknowledges support from NSERC. D.B.
tions. Quantum machine learning is a very new and ex- is an Associate Fellow in the CIFAR program on Quan-
citing field, so we expect the framework to change with tum Information Science. A.S. and M.S. were supported
the needs of the research community, and the availabil- by the USRA Feynman Quantum Academy funded by
ity of new quantum hardware. We have open-sourced the NAMS R&D Student Program at NASA Ames Re-
the framework under the commercially friendly Apache2 search Center and by the Air Force Research Labora-
license, allowing future commercial products to embed tory (AFRL), NYSTEC-USRA Contract (FA8750-19-3-
TFQ royalty-free. If you would like to participate in our 6101). A.Z. acknowledges support from Caltech’s Intel-
community, visit us at: ligent Quantum Networks and Technologies (INQNET)
research program and by the DOE/HEP QuantISED pro-
github.com/tensorflow/quantum/ gram grant, Quantum Machine Learning and Quantum
Computation Frameworks (QMLQCF) for HEP, award
number DE-SC0019227. S.J. acknowledges support from
the Austrian Science Fund (FWF) through the projects
DK-ALM:W1259-N27 and SFB BeyondC F7102, and
VII. ACKNOWLEDGEMENTS from the Austrian Academy of Sciences as a recipient of
the DOC Fellowship. V.D. acknowledges support from
The authors would like to thank Google Research for the Dutch Research Council (NWO/OCW), as part of
supporting this project. In particular, M.B., G.V., T.M., the Quantum Software Consortium programme (project
and A.J.M. would like to thank the Google Quantum AI number 024.003.037).
team for their support during their respective internships,
and several useful discussions with Matt Harrigan, John
ing qubits,” (2021), arXiv:2104.05219 [quant-ph]. [140] J. V. Dillon, I. Langmore, D. Tran, E. Brevdo, S. Va-
[117] Y. Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, sudevan, D. Moore, B. Patton, A. Alemi, M. Hoffman,
T. P. Lillicrap, M. Botvinick, and N. de Freitas, arXiv and R. A. Saurous, “Tensorflow distributions,” (2017),
preprint arXiv:1611.03824 (2016). arXiv:1711.10604 [cs.LG].
[118] E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint [141] C. P. Robert, “The metropolis-hastings algorithm,”
arXiv:1412.6062 (2014). (2015), arXiv:1504.01896 [stat.CO].
[119] G. S. Hartnett and M. Mohseni, “A probabil- [142] S. McArdle, S. Endo, A. Aspuru-Guzik, S. C. Benjamin,
ity density theory for spin-glass systems,” (2020), and X. Yuan, Reviews of Modern Physics 92, 015003
arXiv:2001.00927. (2020).
[120] G. S. Hartnett and M. Mohseni, “Self-supervised learn- [143] J. R. McClean, M. E. Kimchi-Schwartz, J. Carter, and
ing of generative spin-glasses with normalizing flows,” W. A. De Jong, Physical Review A 95, 042308 (2017).
(2020), arXiv:2001.00585. [144] O. Higgott, D. Wang, and S. Brierley, Quantum 3, 156
[121] A. Hagberg, P. Swart, and D. S Chult, Exploring (2019).
network structure, dynamics, and function using Net- [145] K. M. Nakanishi, K. Mitarai, and K. Fujii, Physical
workX, Tech. Rep. (Los Alamos National Lab.(LANL), Review Research 1, 033062 (2019).
Los Alamos, NM (United States), 2008). [146] K. Abhinav et al., Nature 549, 242 (2017).
[122] M. Streif and M. Leib, Quantum Sci. Technol. 5, 034008 [147] H. R. Grimsley, S. E. Economou, E. Barnes, and N. J.
(2020). Mayhall, Nature communications 10, 1 (2019).
[123] H. Pichler, S.-T. Wang, L. Zhou, S. Choi, and [148] M. Ostaszewski, E. Grant, and M. Benedetti, Quantum
M. D. Lukin, “Quantum optimization for maximum 5, 391 (2021).
independent set using rydberg atom arrays,” (2018), [149] P. K. Barkoutsos, G. Nannicini, A. Robert, I. Tavernelli,
arXiv:1808.10816. and S. Woerner, Quantum 4, 256 (2020).
[124] M. Wilson, S. Stromswold, F. Wudarski, S. Had- [150] N. Yamamoto, arXiv preprint arXiv:1909.05074 (2019).
field, N. M. Tubman, and E. Rieffel, “Optimiz- [151] J. R. McClean, N. C. Rubin, K. J. Sung, I. D. Kivlichan,
ing quantum heuristics with meta-learning,” (2019), X. Bonet-Monroig, Y. Cao, C. Dai, E. S. Fried, C. Gid-
arXiv:1908.03185 [quant-ph]. ney, B. Gimby, et al., Quantum Science and Technology
[125] E. Farhi, J. Goldstone, and S. Gutmann, ArXiv e-prints 5, 034014 (2020).
(2014), arXiv:1411.4028 [quant-ph]. [152] Q. Sun, T. C. Berkelbach, N. S. Blunt, G. H. Booth,
[126] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab- S. Guo, Z. Li, J. Liu, J. D. McClain, E. R. Say-
bush, and H. Neven, Nature Communications 9 (2018), futyarova, S. Sharma, et al., Wiley Interdisciplinary
10.1038/s41467-018-07090-4. Reviews: Computational Molecular Science 8, e1340
[127] X. Glorot and Y. Bengio, in AISTATS (2010) pp. 249– (2018).
256. [153] A. W. Sandvik, AIP Confer-
[128] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. ence Proceedings 1297, 135 (2010),
Coles, arXiv preprint arXiv:2001.00550 (2020). https://aip.scitation.org/doi/pdf/10.1063/1.3518900.
[129] Y. Bengio, in Neural networks: Tricks of the trade [154] F. Verstraete, V. Murg, and J. Cirac,
(Springer, 2012) pp. 437–478. Advances in Physics 57, 143 (2008),
[130] E. Knill, G. Ortiz, and R. D. Somma, Physical Review https://doi.org/10.1080/14789940801912366.
A 75, 012328 (2007). [155] J. Carrasquilla and R. G. Melko, Nature Physics 13,
[131] E. Grant, L. Wossnig, M. Ostaszewski, and 431 (2017).
M. Benedetti, Quantum 3, 214 (2019). [156] E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber,
[132] A. Skolik, J. R. McClean, M. Mohseni, P. van der Nature Physics 13, 435 (2017).
Smagt, and M. Leib, Quantum Machine Intelligence [157] A. V. Uvarov, A. S. Kardashin, and J. D. Biamonte,
3, 1 (2021). Phys. Rev. A 102, 012415 (2020).
[133] E. Campos, A. Nasrallah, and J. Biamonte, Physical [158] A. Smith, M. S. Kim, F. Pollmann, and J. Knolle, npj
Review A 103, 032607 (2021). Quantum Information 5, 106 (2019).
[134] C. Cirstoiu, Z. Holmes, J. Iosue, L. Cincio, P. J. [159] W. W. Ho and T. H. Hsieh, SciPost Phys. 6, 29 (2019).
Coles, and A. Sornborger, “Variational fast forward- [160] D. Wierichs, C. Gogolin, and M. Kastoryano, Phys.
ing for quantum simulation beyond the coherence time,” Rev. Research 2, 043246 (2020).
(2019), arXiv:1910.04292 [quant-ph]. [161] R. Wiersema, C. Zhou, Y. de Sereville, J. F. Car-
[135] N. Wiebe, C. Granade, C. Ferrie, and D. Cory, rasquilla, Y. B. Kim, and H. Yuen, PRX Quantum
Physical Review Letters 112 (2014), 10.1103/phys- 1, 020319 (2020).
revlett.112.190501. [162] M. S. L. du Croo de Jongh and J. M. J. van Leeuwen,
[136] G. Verdon, T. McCourt, E. Luzhnica, V. Singh, S. Le- Phys. Rev. B 57, 8494 (1998).
ichenauer, and J. Hidary, “Quantum graph neural net- [163] H. W. J. Blöte and Y. Deng, Phys. Rev. E 66, 066110
works,” (2019), arXiv:1909.12264 [quant-ph]. (2002).
[137] S. Greydanus, M. Dzamba, and J. Yosinski, “Hamil- [164] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu,
tonian neural networks,” (2019), arXiv:1906.01563 D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio,
[cs.NE]. in Advances in neural information processing systems
[138] L. Cincio, Y. Subaşı, A. T. Sornborger, and P. J. Coles, (2014) pp. 2672–2680.
New Journal of Physics 20, 113022 (2018). [165] P.-L. Dallaire-Demers and N. Killoran, Phys. Rev. A 98,
[139] M. A. Nielsen and I. L. Chuang, Quantum Computation 012324 (2018).
and Quantum Information: 10th Anniversary Edition, [166] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost,
10th ed. (Cambridge University Press, USA, 2011). N. Wiebe, and S. Lloyd, Nature (London) 549, 195
56
(2017), arXiv:1611.09347 [quant-ph]. S. Hashme, C. Hesse, et al., “Dota 2 with large scale
[167] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, deep reinforcement learning,” (2019), arXiv:1912.06680
R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. [cs.LG].
Buell, et al., arXiv:1910.11333 (2019). [176] S. Jerbi, C. Gyurik, S. Marshall, H. J. Briegel, and
[168] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, V. Dunjko, “Variational quantum policies for reinforce-
R. Barends, A. Bengtsson, S. Boixo, M. Broughton, ment learning,” (2021), arXiv:2103.05577 [quant-ph].
B. B. Buckley, et al., arXiv preprint arXiv:2010.07965 [177] S. Y.-C. Chen, C.-H. H. Yang, J. Qi, P.-Y. Chen, X. Ma,
(2020). and H.-S. Goan, IEEE Access 8, 141007 (2020).
[169] N. Klco and M. J. Savage, Phys. Rev. A 102, 012612 [178] O. Lockwood and M. Si, in Proceedings of the AAAI
(2020). Conference on Artificial Intelligence and Interactive
[170] R. S. Sutton, A. G. Barto, et al., Introduction to re- Digital Entertainment, Vol. 16 (2020) pp. 245–251.
inforcement learning, Vol. 135 (MIT press Cambridge, [179] A. Skolik, S. Jerbi, and V. Dunjko, “Quantum agents
1998). in the gym: a variational quantum algorithm for deep
[171] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Ve- q-learning,” (2021), arXiv:2103.15084 [quant-ph].
ness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. [180] O. Lockwood and M. Si, in NeurIPS 2020 Workshop
Fidjeland, G. Ostrovski, et al., nature 518, 529 (2015). on Pre-registration in Machine Learning (PMLR, 2021)
[172] P. Mirowski, M. K. Grimes, M. Malinowski, K. M. pp. 285–301.
Hermann, K. Anderson, D. Teplyashin, K. Simonyan, [181] G. Brockman, V. Cheung, L. Pettersson, J. Schneider,
K. Kavukcuoglu, A. Zisserman, and R. Hadsell, arXiv J. Schulman, J. Tang, and W. Zaremba, “Openai gym,”
preprint arXiv:1804.00168 (2018). (2016), arXiv:1606.01540 [cs.LG].
[173] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, [182] M. Schuld, R. Sweke, and J. J. Meyer, Physical Review
A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A 103, 032430 (2021).
A. Bolton, et al., nature 550, 354 (2017). [183] A. Pérez-Salinas, A. Cervera-Lierta, E. Gil-Fuster, and
[174] O. Vinyals, I. Babuschkin, W. M. Czarnecki, M. Math- J. I. Latorre, Quantum 4, 226 (2020).
ieu, A. Dudzik, J. Chung, D. H. Choi, R. Powell, [184] R. S. Sutton, D. A. McAllester, S. P. Singh, Y. Mansour,
T. Ewalds, P. Georgiev, et al., Nature 575, 350 (2019). et al., in NIPS, Vol. 99 (1999) pp. 1057–1063.
[175] C. Berner, G. Brockman, B. Chan, V. Cheung, [185] S. Wu, S. Jin, D. Wen, and X. Wang, arXiv preprint
P. Dębiak, C. Dennison, D. Farhi, Q. Fischer, arXiv:2012.10711 (2020).