Professional Documents
Culture Documents
Machine Learning For Practical Quantum Error Mitigation
Machine Learning For Practical Quantum Error Mitigation
Machine Learning For Practical Quantum Error Mitigation
Haoran Liao,1, ∗ Derek S. Wang,1, ∗ Iskandar Sitdikov,1 Ciro Salcedo,1 Alireza Seif,1 and Zlatko K. Minev1
1
IBM Quantum, IBM T.J. Watson Research Center, Yorktown Heights, NY 10598, USA
(Dated: October 2, 2023)
Quantum computers are actively competing to surpass classical supercomputers, but quantum
errors remain their chief obstacle. The key to overcoming these on near-term devices has emerged
through the field of quantum error mitigation, enabling improved accuracy at the cost of addi-
tional runtime. In practice, however, the success of mitigation is limited by a generally exponential
overhead. Can classical machine learning address this challenge on today’s quantum computers?
Here, through both simulations and experiments on state-of-the-art quantum computers using up
to 100 qubits, we demonstrate that machine learning for quantum error mitigation (ML-QEM) can
drastically reduce overheads, maintain or even surpass the accuracy of conventional methods, and
arXiv:2309.17368v1 [quant-ph] 29 Sep 2023
yield near noise-free results for quantum algorithms. We benchmark a variety of machine learning
models—linear regression, random forests, multi-layer perceptrons, and graph neural networks—on
diverse classes of quantum circuits, over increasingly complex device-noise profiles, under interpola-
tion and extrapolation, and for small and large quantum circuits. These tests employ the popular
digital zero-noise extrapolation method as an added reference. We further show how to scale ML-
QEM to classically intractable quantum circuits by mimicking the results of traditional mitigation
results, while significantly reducing overhead. Our results highlight the potential of classical machine
learning for practical quantum computation.
(QFRGHU
0DFKLQHOHDUQLQJ
&LUFXLW )HDWXUHV PRGHO
T +
T
Q0 +H
T +
T ;
T
Q1 ;X %DFNHQGSURSHUWLHV
T ;
T ;
T ; 438
TQ2 ; H
T +
T + ([HFXWH
TQ3 + X
T ;
T ;
TQ4 ; H
T ;
T ;
TQ5 ; X
T +
0RGHOXSGDWH
T +
TQ6 + H
T ;
T ; 1RLVHOHVVVLPXODWRU (UURUPLWLJDWHG438 /RVVIXQFWLRQ
TQ7 ;
T X; 6LPXODWH
T ; RU
T ;
Q8 H
Fig. 1. Machine-learning quantum error mitigation (ML-QEM): execution and training for tractable and
intractable circuits. A quantum circuit (left) is passed to an encoder (top) that creates a feature set for the ML model
(right) based on the circuit and the quantum processor unit (QPU) targeted for execution. The model and features are readily
replaceable. The executed noisy expectation values ⟨Ô⟩noisy (middle) serve as the input to the model whose aim is to predict
their noise-free value ⟨Ô⟩mit . To achieve this, the model is trained beforehand (bottom, blue highlighted path) against target
values ⟨Ô⟩target of example circuits. These are obtained either using noiseless simulations in the case of small-scale, tractable
circuits or using the noisy QPU in conjunction with a conventional error mitigation strategy in the case of large-scale, intractable
circuits. The training minimizes the loss function, typically the mean square error. The trained model operates without the
need for additional mitigation circuits, thus reducing runtime overheads.
large quantum circuit volumes beyond the limits of clas- (ZNE)—while requiring lower overhead by a factor of at
sical simulation. To date, there has not been a systematic least 2 in runtime. Finally, with experiments on IBM
study comparing different traditional methods and sta- quantum computers for quantum circuits with up to 100
tistical models for QEM on equal footing under practical qubits and two-qubit gate depth of 40, we propose a path
scenarios across a variety of relevant quantum computa- toward scalable mitigation by mimicking traditional mit-
tional tasks. igation methods with superior runtime efficiency, which
In this article, we present a general framework to per- also serves as a further example of using classical machine
form ML-QEM for higher runtime efficiency compared learning on quantum data [20].
to other mitigation methods. Our study encompasses a
broad spectrum of simple to complex machine learning
models, including the previously proposed linear regres- II. RESULTS
sion and multi-layer perceptron model. We further pro-
pose two new models, random forests and graph neural The ML-QEM workflow (see Fig. 1) operates on a
networks. We find that random forests seem to consis- given class of quantum circuits for which we train an
tently perform the best. We evaluate the performance ML model to predict near noise-free expectation values
of all four models in diverse realistic scenarios. We con- based on noisy expectation values obtained from a quan-
sider a range of circuit classes (random circuits and Trot- tum processing unit (QPU). This is required, since, in
terized Ising dynamics) and increasingly complex noise general, the output of the quantum circuit is considered
models in simulations (including incoherent and coher- intractable and cannot be learned in isolation by the ML
ent gate errors, readout errors, and qubit errors). Addi- model. Details of the training set, encoded circuit and
tionally, we explore the advantages of ML-QEM methods QPU features, and ML models, can be found in the Meth-
over traditional approaches in common use cases, such as ods section. A key feature of the ML-QEM model is
generalization to unseen Pauli observables, and enhance- that at runtime, the model produces mitigated expec-
ment of variational quantum-classical tasks. Our analy- tation values from the noisy ones without the need for
sis reveals that ML-QEM methods, particularly random additional mitigation circuits, thus dramatically reduc-
forest, exhibit competitive performance compared to a ing overheads.
state-of-the-art method—digital zero-noise extrapolation As reported in the following, we find at par or even
3
cific types of noise include incoherent gate errors, qubit On the noisy simulator in Fig. 3, for this problem with
decoherence, and readout errors. We train the ML-QEM incoherent gate noise in the absence (left) or presence
models to mitigate the noisy expectation values of the (right) of readout error, the ML-QEM model (using the
four single-qubit Ẑi observables. As a benchmark, we random forest) performs better than the ZNE method.
also compare mitigated expectation values from digital We present a comparison across all ML-QEM models in
ZNE. In Fig. 2, we show the error (between the mitigated App. C, such that the RF model demonstrates the best
expectation values and the ideal ones) distribution of dig- performance among the ML-QEM models both in inter-
ital ZNE and ML-QEM with each of the four machine polation and extrapolation, closely followed by the MLP,
learning models on the top plot and the bootstrap mean OLS, and GNN. We envision that ML-QEM can be used
errors in the bottom plot. We observe that the random to improve the accuracy of noisy quantum computations
forest consistently outperforms the other ML-QEM mod- for circuits with gate depths exceeding those included in
els, with the MLP model closely following. Notably, all the training set.
ML-QEM models, including OLS and GNN, exhibit com- On the right of Fig. 3, we consider the same prob-
petitive performance in comparison to the ZNE method, lem in the second study but simulate the sampled cir-
despite that the runtime overhead for ZNE is twice as cuits on FakeLima backend with additional coherent er-
much. Finally, we emphasize that rigor hyperparame- rors. The added coherent errors are CNOT gate over-
ter optimization may impact the relative performance of rotations with an average over-rotational angle of 0.02π.
these methods, and we leave this analysis to future work. We again generate multiple instances of the problem with
varying numbers of Trotter steps and coupling strengths
uniformly sampled from the paramagnetic phase. We
2. Trotterized 1D Transverse-field Ising Model then train the ML-QEM model (using the random for-
est) on the ideal and noisy expectation values obtained
To benchmark the performance of the protocol on from these circuits executed on the modified FakeLima
structured circuits, we consider Trotterized brickwork backend and compare their performance with ZNE.
circuits. Here, we consider first-order Trotterized dynam- During the testing phase, we also perform extrapola-
ics of the 1D transverse-field Ising model (TFIM) subject tion where some testing circuits have Trotter steps ex-
to different noise models based on the incoherent noise ceeding the maximal steps present in the training circuits.
on the FakeLima simulator in Fig. 3, before moving to Specifically, the testing circuits cover 14 more steps up to
experiments on IBM hardware with actual device noise Trotter step 29. Under the influence of added coherent
in Fig. 4. We observe that these circuits are not only noise, the performance of the ML-QEM model and digital
broadly representative but also bear similarities to those ZNE deteriorated compared to the previous study. How-
used for Floquet dynamics [25]. The dynamics of the spin ever, in the extrapolation scenario, none of the models
chain is described by the Hamiltonian demonstrated effective mitigation of the noisy expecta-
X X tion values. In practical applications, a combination of,
Ĥ = −J Ẑj Ẑj+1 + h X̂j = −J ĤZZ + hĤX , e.g., dynamical decoupling [22] and randomized compil-
j j
ing [7, 24], which can suppress all coherent errors, could
where J denotes the exchange coupling between neigh- be applied to the test circuits prior to utilizing ML-QEM
boring spins and h represents the transverse magnetic models. This approach effectively converts the noise into
field, whose first-order Trotterized circuit is shown in incoherent noise, enabling the ML-QEM methods to per-
the inset of Fig. 3. We generate multiple instances of form optimally in extrapolation. We remark that co-
the problem with varying numbers of Trotter steps and herent gate errors induce quadratic changes in the ex-
coupling strengths, such that the coupling strengths of pectation values, which are stronger than incoherent er-
each circuit are uniformly sampled from the paramag- rors inducing only linear changes—it is plausible that the
netic phase (J < h) by choice. There are 300 training machine learning approach performs better with weak
circuits and 300 testing circuits per Trotter step, and the noises.
training circuits cover Trotter steps up to 14. Each cir- We benchmark the performance of the ML-QEM
cuit is measured in a randomly chosen Pauli basis for all model against digital ZNE on real quantum hardware,
the weight-one observables. We then train the ML-QEM IBM’s ibm algiers. In this experiment, we do not ap-
models on the ideal and noisy expectation values ob- ply any additional error suppression or error mitigation
tained from these circuits and compare their performance such as dynamical decoupling, randomized compiling, or
with digital ZNE. During the testing phase, we consider readout error mitigation; thus, the experiment involves
both interpolation and extrapolation. In interpolation, incoherence noise, coherent noise, and readout error, with
we test on circuits with sampled coupling strength J not the results shown in Fig. 4. We train the ML-QEM with
included in training but with Trotter steps included in random forest on 50 circuits and test it on 250 circuits
the training. In extrapolation, we test on circuits with at each Trotter step. We observe that 50 training cir-
sampled coupling strength J not included in the training cuits per step, totaling 500 training circuits, suffices to
as well as with Trotter steps exceeding the maximal steps have the model trained well. With this low train-test
present in the training circuits. split ratio, the ML-QEM requires 500 + 2,500 = 3,000
5
Q0 RX
Q1 RX RZ
Q2 RX RZ
Q3 RX RZ
Fig. 3. Mitigation accuracy under i) complexity of quantum noise and ii) ML-QEM interpolation and ex-
trapolation for Trotter circuits. Top row: Average error performance on Trotter circuits (top-left inset) representing the
quantum time dynamics of a four-site, 1D, transverse-field Ising model in numerical simulations. A Trotter step comprises
four layers of CNOT gates (inset). Vertical dashed line separates experiments in the ML-QEM interpolation regime (left) from
the extrapolation regime (right). The 3 curves represent the performance of the highest-performing ML-QEM method, the
QEM ZNE method, and the unmitigated simulations. They are averaged over 300 circuits, each with a randomly chosen Pauli
measurement bases. The data is for all four weight-one expectations ⟨P̂i ⟩. The error is defined as L2 distance from the ideal
expectations, ∥⟨P̂ ⟩ − ⟨P̂ ⟩ideal ∥2 , as also defined for the remainder of figures. From the left to right, the complexity of the device
noise model increases to include additional realistic noise types. Coherent errors are introduced on CNOT gates. Bottom row:
Corresponding typical data of the error-mitigated expectation values of the ⟨Z0 ⟩ Trotter evolution; here, for Ising parameter
ratio J/h = 0.15.
total circuits, while running ZNE with 2 noise factors mography and in variational quantum eigensolver. Addi-
on the testing circuits requires 2 × 2,500 = 5,000 total tional error mitigation methods incur a large overhead on
circuits. The ML-QEM claims a reduction of quantum top of these target circuits by requiring additional mitiga-
resource overhead compared to ZNE both overall and at tion circuits for each target circuit. Here, we show that it
runtime—the reduction is as large as 30% overall and is possible to achieve better mitigation performance with
50% at runtime. Additionally, we observe that the ML- lower overhead using an ML-QEM method.
QEM method RF significantly outperforms ZNE for all In particular, we evaluate the performance of the ML-
Trotter steps, demonstrating the efficacy of this approach QEM to mitigate unseen Pauli observables on a state
under a realistic scenario. We report approximately 0.7 |ψ⟩ produced by the Trotterized Ising circuit depicted
QPU hours (based on a conservative sampling rate of on the top of Fig. 5(a), which contains 6 qubits and 5
2 kHz [9]) to generate all the training data and seconds Trotter steps. We train the random forest model on in-
to train the model with a single-core laptop for this ex- creasing fractions of the 46 − 1 = 4,095 Pauli observables
periment. of a Trotterized Ising circuit with J/h = 0.15, and then
we apply the model to mitigate noisy expectation values
sampled from the rest of all possible Pauli observables.
B. Mitigating Unseen Pauli Observables The results of this study are plotted at the bottom of
Fig. 5(a). We observe that training the random forest
There are algorithms in which we care about the expec- on just a small fraction (≲ 2%) of the Pauli observables
tation values of multiple non-commuting Pauli observ- results in mitigated expectation values with errors lower
ables on the same circuit, effectively creating multiple than when using ZNE. The ML-QEM method addition-
target circuits with the same gate sequences but with ally has lower runtime overhead—there are no mitigation
different measuring basis, such as in quantum state to- circuits with amplified noise required at runtime, and the
6
ferent Hamiltonians.
To demonstrate this concept, we train the ML-QEM
model with RF on 2,000 circuits with each parameter
randomly sampled from [−5, 5], and compute the dissoci-
ation curve of the H2 molecule on the bottom of Fig. 5(b).
The ML-QEM random forest model is trained on a two-
local variational ansatz (depicted on the top of Fig. 5(b))
across many randomly sampled {θ}. ⃗ This method results
in energies that are close to chemical accuracy. Notably,
the absolute errors are an order of magnitude smaller
than those of the ZNE-mitigated energies.
Additionally, the training overhead of the ML-QEM
model in VQE with different Hamiltonians can be signifi-
cantly reduced by generalizing to mitigating unseen PPauli
observables in Sec. II B. By decomposing Ĥ = i ci P̂i
Fig. 4. On QPU hardware: accuracy and overhead
into Pauli terms, the ML-QEM only needs to train on
for ML-QEM and QEM. Average execution error of Trot- ⃗ with a subset of the Pauli ob-
ter circuits for experiments on QPU device ibm algiers with- the sampled ansatz Û (θ)
out mitigation and with ZNE or ML-QEM RF mitigation. Er- servables in Ĥ. This is demonstrated in our experiment
ror performance is averaged over 250 Ising circuits per Trot- shown in Fig. 5(b) where the random forest is trained
ter step, each with sampled Ising parameters J < h and each on sampled ansatz Û (θ) measured in X̂1 X̂2 and Ẑ1 Ẑ2
measured for all weight-one observables in a randomly chosen (1,000 for each observable), while the Hamiltonian of the
Pauli basis. Training is performed over 50 circuits per Trotter
step, which results in both a 40% lower overall and 50% lower
H2 molecule at each bond length consists of X̂1 X̂2 , Ẑ1 Ẑ2 ,
runtime quantum resource overhead of RF compared to the Iˆ1 Ẑ2 and Ẑ1 Iˆ2 Pauli observables [26].
overhead of the digital ZNE implementation (see inset).
Repeated Trotter steps
Q0
q0
Q0 RX
RX Ansatz
Pauli observables
Q1
q1
Q1 RX
RX + RZ
RZ +
Optimizer QPU Error mitigation
Q2
q2 RX
RX + RZ
RZ +
Trained
Q3 RX + RZ + Mitigated Workflow
q3 RX RZ
.. .. .. ..
. . . .
...
Q5
q5 RX
RX + RZ
RZ +
Fig. 5. Application of ML-QEM to a) unseen expectation values and b) the variational quantum eigensolver
(VQE). a) Top: Schematic of a Trotter circuit, which prepares a many-body quantum state on n = 6 qubits (in 5 Trotter
steps). Top right: Circle depicts the pool of all possible 4n Pauli observables. Shaded region depicts the fraction of observables
used in training the ML model; the remaining observables are unseen prior to deployment in mitigation. Bottom: Average error
of mitigated unseen Pauli observables versus the total number of distinct observables seen in training. b) Top: Schematic of the
VQE ansatz circuit for 2 qubits parametrized by 8 angles θ. ⃗ Below, a depiction of the VQE optimization workflow optimizing
the set of angles θ⃗ on a simulated QPU, yielding the noisy chemical energy ⟨Ĥ⟩noisy
⃗
θ
, which is first mitigated by the ML-QEM
or QEM before being used in the optimizer as ⟨Ĥ⟩mit
⃗ . Compared to the ZNE method, the ML-QEM with RF method obviates
θ
the need for additional mitigation circuits at every optimization iteration at runtime.
experiment, we apply Pauli twirling to all the circuits, expectation values cannot be computed classically. In
each with 5 twirls, before applying either extrapolation this experiment, although 1D TFIM is analytically solv-
in digital ZNE or the ML-QEM to mitigate the expecta- able, the Trotter errors should be taken into considera-
tion values. tion, and thus the exact expectation values of the circuits
We then find that the ML-QEM models are able to are not easily accessible, and thus not shown.
accurately mimic the traditional method’s mitigated ex- Importantly, this mimicry approach requires less quan-
pectation values. The average distance from the unmit- tum computational overhead both overall and at runtime.
igated result (after twirling average) for the mitigated For this experiment, we test on 40 different coupling
expectation values produced by ZNE and the random strengths J for h = 0.66π, each of which is used to gen-
forest model mimicking ZNE are very close for all Trot- erate 10 circuits with up to 10 Trotter steps, or 400 test
ter steps, as shown for specific J and h corresponding circuits in total. The traditional ZNE approach with 2
to non-Clifford circuits in the second and third plot of noise factors requires 2 × 400 = 800 circuits. In contrast,
Fig. 6. In the fourth and bottom plot showing the residu- the RF-mimicking-ZNE approach here is trained with 10
als between the ZNE-mitigated and RF-mimicking-ZNE- different coupling strengths J for h = 0.66π, each of
mitigated values averaged over the training set compris- which generates 10 circuits with up to 10 Trotter steps, or
ing non-Clifford circuits, we see that RF mimicks ZNE 100 total training circuits. Therefore, the RF-mimicking-
well. This result demonstrates that ML-QEM methods ZNE approach requires only 2 × 100 + 400 = 600 total
can scalably accelerate traditional quantum error miti- circuits, resulting in 25% overall lower quantum com-
gation methods by mimicking their behavior when exact putational resources. The savings are even more dras-
8
III. DISCUSSION
formation about the circuit and the noise model, such 1. Linear Regression
as pulse shapes, and output counts, could lead to even
more accurate error mitigation. Third, one could study Linear regression is a simple and interpretable method
the effect of the drift of noise in hardware on the machine for ML-QEM, where the relationship between dependent
learning model. This would allow one to optimize the re- variables (the ideal expectation values) and independent
sources needed to fine-tune the neural network models variables (the features extracted from quantum circuits
or train even to retrain a simple random forest model and the noisy expectation values) is modeled using a lin-
from scratch periodically. As a step in this direction, ear function.
we provide evidence that the models can be efficient in
One relevant work in this area is Clifford data regres-
fine-tuning to adapt to a change in the noise model, as
sion, proposed by Czarnik et al. [15]. In their approach,
shown in App. D. Fourth, when trained with different
the authors first replace most of the non-Clifford gates
noise models and their corresponding noisy expectation
with nearby Clifford gates in the target circuit of inter-
values, it is interesting to investigate if setting the noise
est, then use a linear regression model to regress the noisy
parameters encoded in GNN, to the low-noise regime
expectation values of those circuits onto the ideal ones.
(e.g., setting encoded gate error close to zero and the en-
Our linear regression model differs in two main aspects.
coded coherence times to a large value) allows the GNN
Firstly, we extend the feature set to include counts of each
to predict the expectation values close to the ideal ex-
native gate where native parameterized gates are counted
pectation value without ever training on them, and thus
in binned angles, the Pauli observable in sparse Pauli op-
providing potential advantages on both accuracy and effi-
erator representation, and optional device-specific noise
ciency when scaling up to classically-intractable circuits.
parameters. Secondly, our model does not necessarily re-
In other words, it is an open question whether the GNN
quire training on Clifford versions of the target circuits,
can ’extrapolate’ to zero noise without specifically ampli-
although this option remains available if desired.
fying the noise on the target circuit but knowing the noisy
expectation values from different noise models on differ- We train a linear regression model that takes these
ent circuits. Fifth, ML-QEM can be more rigorously op- features as input and predicts the ideal expectation val-
timized and benchmarked against leading methods, such ues. The model minimizes the sum squared error be-
as PEC, PEA, and pulse-stretching ZNE. Finally, extend- tween the mitigated and the ideal expectation values us-
ing these methods to other quantum computing tasks and ing a closed-form solution, which is named ordinary least
applications will help to further establish the utility of squares (OLS). The linear regression model can also be
ML-QEM as a powerful tool for improving the accuracy trained using other methods, such as ridge regression,
and reliability of quantum computations. LASSO, or elastic net. These methods differ in their reg-
ularization techniques, which can help prevent overfitting
In conclusion, our study underscores the potential and improve model generalization. In our experiments,
of ML-QEM methods for enhancing quantum compu- we use OLS for its simplicity and ease of interpretation.
tations. Understanding the strengths and weaknesses We note that standard feature selection procedures also
of different models and developing strategies to improve help to prevent overfitting and collinearity in practice.
their efficiency paves the way for increasingly robust and
accurate applications of quantum computing.
2. Random Forest
a) OLS b) RF c) MLP
Tree 1 ... Tree N
0.81 5 7
Features X
d) GNN Circuit on device as a graph *Blue indicates rules or parameters that are fitted or trained
Gate Error
Reset
q0 H q SX
0 SX 0.001 Message
RZ Zoom in Passing MLP
q0
q1 q1 q1
Reset
H CX Encode ,
T1 (us) T2 (us)
c and
200 150
q0
X
Fig. 7. Overview of the four ML-QEM models and their encoded features. (a) Linear regression (specifically
ordinary least-square (OLS)): input features are vectors including circuit features (such as the number of two-qubit gates n2Q
and SX gates nSX ), noisy expectation values ⟨Ô⟩noisy , and observables Ô. The model consists of a linear function that maps
input features to mitigated values ⟨Ô⟩mit . (b) Random forest (RF): the model consists of an ensemble of decision trees and
produces a prediction by averaging the predictions from each tree. (c) Multi-layer perception (MLP): the same encoding as that
for linear regression is used, and the model consists of one or more fully connected layers of neurons. The non-linear activation
functions enable the approximation of non-linear relationships. (d) Graph neural network (GNN): graph-structured input data
is used, with node and edge features encoding quantum circuit and noise information. The model consists of multiple layers of
message-passing operations, capturing both local and global information within the graph and enabling intricate relationships
to be modeled.
decision trees to produce an estimate of the ideal expec- squared error between the predicted and true ideal expec-
tation value. tation values, employing backpropagation to update the
For RF, we used 100 tree estimators for each observ- neurons. The batch size is 32, and the optimizer used
able. The tree construction process follows a top-down, is Adam [28] with an initial learning rate of 0.001. In
recursive, and greedy approach, using the Classification practice, regularization techniques like dropout or weight
and Regression Trees (CART) algorithm. For the split- decay can be used to prevent overfitting if necessary. The
ting criterion, we employ the mean squared error reduc- MLP method demonstrates competitive performance in
tion for regressions. For each tree, at least 2 samples mitigating noisy expectation values, as evidenced by our
are required to split an internal node, and 1 feature is experiments. However, it should be noted that MLPs are
considered when looking for the best split. also susceptible to overfitting in this context.
Multi-layer perceptrons (MLPs), first explored in the As the most complex model among the four, graph neu-
context of QEM in Ref. [14], are feedforward artificial ral networks (GNNs) are designed to work with graph-
neural networks composed of layers of nodes, with each structured data, such as social networks [29] and chem-
layer fully connected to the subsequent one. Nodes istry [30]. They can capture both local and global infor-
within the hidden layers utilize non-linear activation mation within a graph, making them highly expressive
functions, such as the rectified linear unit (ReLU), en- and capable of modeling intricate relationships. How-
abling the MLP to model non-linear relationships. ever, their increased complexity results in higher com-
We construct MLPs with 2 dense layers with a hidden putational costs, and they may be more challenging to
size of 64 and the ReLU activation function. The input implement and interpret.
features are identical to those employed in the random A core aspect of our ML-QEM with GNN lies in data
forest model. To train the MLP, we minimize the mean encoding, which consists of encoding quantum circuits,
11
and device noise parameters into graph structures suit- The decorator requires a model as one of the argu-
able for GNNs. Before data encoding, each quantum ments. We provide several default models, such as Scikit-
circuit is first transpiled into hardware-native gates that learn, PyTorch, and TensorFlow, and examples of how
adhere to the quantum device’s connectivity. To encode to train and use them. The source code, documenta-
them for GNN, the transpiled circuit is converted into tion, and examples can be found at https://github.
an acyclic graph. In the graph, each edge signifies a com/qiskit-community/blackwater.
qubit that receives instructions when directed towards The research part of the source code for all simulations
a node, while each node corresponds to a gate. These and hardware experiments presented in this paper can
nodes are assigned vectors containing information about be found at https://github.com/qiskit-community/
the gate type, gate errors, as well as the coherence times blackwater/tree/research.
and readout errors of the qubits on which the gate op-
erates. Additional device and qubit characterizations,
such as qubit crosstalk and idling period duration, can DATA AVAILABILITY
be encoded on the edge or node, although they are not
considered in the current study. The data that support the findings of this study are
The acyclic graph of a quantum circuit, serves as in- available upon reasonable request. In addition, we plan
put to the transformer convolution layers of the GNN. to make the datasets and code used in this research pub-
These message-passing layers iteratively process and ag- licly accessible in a repository, which will be linked in the
gregate encoded vectors on neighboring nodes and con- final version of the paper. The repository will contain
nected edges to update the assigned vector on each node. the raw data, preprocessed data, model configurations,
This enables the exchange of information based on graph training scripts, and evaluation scripts necessary to re-
connectivity, facilitating the extraction of useful infor- produce the results presented in this paper. This will
mation from the nodes which are the gate sequence in allow researchers to verify and build upon our findings,
our context. The output, along with the noisy expecta- as well as to explore new directions in ML-QEM research.
tion values, is passed through dense layers to generate a
graph-level prediction, specifically the mitigated expec-
tation values. As a result, after training the layers using ACKNOWLEDGEMENTS
backpropagation to minimize the mean squared error be-
tween the noisy and ideal expectation values, the GNN We thank Brian Quanz, Patrick Rall, Oles Shtanko,
model learns to perform quantum error mitigation. Roland De Putter, Kunal Sharma, Barbara Jones,
For the GNN, we use 2 multi-head Transformer convo- Sona Najafi, Thaddeus Pelligrini, Grace Harper, Vinay
lution layers [31] and ASAPooling layers [32] followed by Tripathi, Antonio Mezzacapo, Christa Zoufal, Travis
2 dense layers with a hidden size of 128. Dropouts are Scholten, Bryce Fuller, Swarnadeep Majumder, Sarah
added to various layers. As with the MLP, the batch size Sheldon, Youngseok Kim, Pedro Rivero, Will Bradbury,
is 32, and the optimizer used is Adam [28] with an initial and Nate Gruver for valuable discussions. The bond
learning rate of 0.001. distance-dependent Hamiltonian coefficients for comput-
ing the ground-state energy of the H2 molecule using the
hybrid quantum-classical VQE algorithm were shared by
B. Zero-Noise Extrapolation the authors of Ref. [26].
Incoherent noise Incoherent noise + readout error Incoherent + coherent noise + readout error
0.5 Interpolation Extrapolation Interpolation Extrapolation Interpolation Extrapolation
0.4
0.3
Error
Unmitigated
0.2 ZNE
GNN
OLS
0.1 MLP
RF
0.00 5 10 15 20 25 0 5 10 15 20 25 0 5 10 15 20 25
Trotter steps Trotter steps Trotter steps
Fig. 8. ML-QEM and QEM performance for Trotter circuits. Expanded data corresponding to Fig. 3 of the main text
that includes the three ML-QEM methods not shown earlier: GNN, OLS, MLP. We study three noise models: Left: incoherent
noise resembling ibmq lima without readout error, Middle: with the additional readout error, and Right: with the addition
of coherent errors on the two-qubit CNOT gates. We show the depth-dependent performance of error mitigation averaged
over 9,000 Ising circuits, each with different coupling strengths J. For the incoherent noise model, all ML-QEM methods
demonstrate improved performance even when mitigating circuits with depths larger than those included in the training set.
However, all perform as poorly as the unmitigated case in extrapolation with additional coherent noise.
values when the noisy channel of the circuit consists of linear regression parameters can be learned to vary ac-
successive layers of depolarizing channels. This is more cording to the number of layers l. The ML-QEM is thus
general than the result shown in [15]. unbiased in this case. We note that ZNE with linear
Consider l successive layers of unitaries each associated extrapolation is still biased in this case, since the noise
with a depolarizing channel with some rate pl , the noisy amplification effectively results in a different combined
circuit acting on the input ρ, C(ρ), ˜ ˜
is written as C(ρ) = depolarizing rate p′ (l) = 1 − Πli=1 (1 − p′i ), which leads to
† † † expectation values with differently amplified noises each
El (Ul El−1 (Ul−1 . . . E1 (U1 ρU1 ) . . . Ul−1 )Ul ), where El (ρ) =
(pl /D)I + (1 − pl )ρ. lying on a different line towards the ideal expectation
It can be shown by induction that C(ρ) ˜ = (p(l)/D)I + value, and thus the linear extrapolation cannot yield un-
† †
(1−p(l))Ul . . . U1 ρU1 . . . Ul , where p(l) = 1−Πli=1 (1−pi ) biased estimates.
as follows. Assuming for l = k, C(ρ) ˜ = (p(k)/D)I +
† †
(1 − p(k))Uk . . . U1 ρU1 . . . Uk , then for l = k + 1, we Appendix B: Break-even in the total cost of the
˜
have C(ρ) = (p(k)/D)I + (1 − p(k))[pk+1 I/D + (1 − ML-QEM
pk+1 )Uk . . . U1 ρU1† . . . Uk† ] = (p(k + 1)/D)I + (1 − p(k +
1))Uk . . . U1 ρU1† . . . Uk† . The induction completes with a Assuming the mimicked QEM requires m total execu-
trivial base case. tions of either the mitigation circuits or the circuit of
Therefore, the noisy expectation value of Ô becomes interest (e.g., digital/analog ZNE usually has m = 2 or
3 noise factors), the total cost of the mimicked QEM,
˜ Ô)
Tr(C(ρ) namely its runtime cost, is mntest . The total cost, in-
p(l) cluding training, for the RF is mntrain + ntest . Equating
= Tr(Ô) + (1 − p(l))Tr(Ul . . . U1 ρU1† . . . Ul† Ô) these two yields the break-even train-test split ratio in
D
p(l) the total cost of our mimicry compared to the traditional
= Tr(Ô) + (1 − p(l))Tr(C(ρ)Ô) , QEM: ntrain /ntest = (m − 1)/m. Our mimicry shows a
D
higher overall efficiency when the train-test split ratio is
where Tr(C(ρ)Ô) is the ideal expectation value of Ô. smaller than (m − 1)/m.
For Trotterized circuits with a fixed Trotter step and a
fixed brickwork structure, the number of layers l of uni-
taries in the circuit is also fixed. Assuming some fixed- Appendix C: Additional Experimental Details
rate depolarizing channels associated with the l layers of
unitaries, the noisy and ideal expectation values of some All non-ideal expectation values in simulations and ex-
Ô on these Trotterized circuits with different parameters periments presented in this paper are obtained from the
then lie on a line. Therefore, the ML-QEM method can measurement statistics from 10,000 shots.
mitigate the expectation values by linear regression from In the study of 4-qubit random circuits pre-
the noisy expectation values to the ideal ones, and the sented in Sec. II A 1, we use the Qiskit function
13
qiskit.circuit.random.random circuit() to gener- a different noise model after they have been originally
ate the random circuits, which implements random trained on a noise model. The fine-tuning is expected to
sampling and placement of 1-qubit and 2-qubit gates, require only a small number of additional samples—this
with randomly sampled parameters for any selected is demonstrated in Fig. 9 with the MLP trained on noise
parametrized gates. The 2-qubit gate depth is mea- 0.5 MLP trained on noise model A and fine-tuned on B
sured after transpilation. We remark that the random
RF trained (from scratch) on noise model B
[1] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick machine learning,” Nature 549, 195–202 (2017).
Rebentrost, Nathan Wiebe, and Seth Lloyd, “Quantum
14
[2] Andrew J. Daley, Immanuel Bloch, Christian Kokail, guages and Operating Systems (2021) p. 443–455.
Stuart Flannigan, Natalie Pearson, Matthias Troyer, and [19] Armands Strikis, Dayue Qin, Yanzhu Chen, Simon C.
Peter Zoller, “Practical quantum advantage in quantum Benjamin, and Ying Li, “Learning-based quantum error
simulation,” Nature 607, 667–676 (2022). mitigation,” PRX Quantum 2, 040330 (2021).
[3] Earl T. Campbell, Barbara M. Terhal, and Christophe [20] Hsin-Yuan Huang, Richard Kueng, Giacomo Torlai, Vic-
Vuillot, “Roads towards fault-tolerant universal quantum tor V. Albert, and John Preskill, “Provably efficient ma-
computation,” Nature 549, 172–179 (2017). chine learning for quantum many-body problems,” Sci-
[4] Sergey Bravyi, Oliver Dial, Jay M. Gambetta, Darı́o ence 377 (2022).
Gil, and Zaira Nazario, “The future of quantum com- [21] Ivan Henao, Jader P. Santos, and Raam Uzdin, “Adap-
puting with superconducting qubits,” Journal of Applied tive quantum error mitigation using pulse-based inverse
Physics 132, 160902 (2022). evolutions,” arXiv:2303.05001 (2023).
[5] Zhenyu Cai, Ryan Babbush, Simon C. Benjamin, Suguru [22] Nic Ezzell, Bibek Pokharel, Lina Tewala, Gregory
Endo, William J. Huggins, Ying Li, Jarrod R. McClean, Quiroz, and Daniel A. Lidar, “Dynamical decoupling
and Thomas E. O’Brien, “Quantum error mitigation,” for superconducting qubits: a performance survey,”
arXiv:2210.00921 (2022). arXiv:2207.03670 (2022).
[6] Abhinav Kandala, Kristan Temme, Antonio D. Córcoles, [23] Bibek Pokharel and Daniel A. Lidar, “Demonstra-
Antonio Mezzacapo, Jerry M. Chow, and Jay M. Gam- tion of algorithmic quantum speedup,” arXiv:2207.07647
betta, “Error mitigation extends the computational reach (2022).
of a noisy quantum processor,” Nature 567, 491–495 [24] Joel J. Wallman and Joseph Emerson, “Noise tailoring
(2019). for scalable quantum computation via randomized com-
[7] Ewout van den Berg, Zlatko K. Minev, Abhinav Kan- piling,” Physical Review A 94, 052325 (2016).
dala, and Kristan Temme, “Probabilistic error cancella- [25] Oles Shtanko, Derek S. Wang, Haimeng Zhang, Nikhil
tion with sparse pauli-lindblad models on noisy quantum Harle, Alireza Seif, Ramis Movassagh, and Zlatko Minev,
processors,” Nature Physics (2023). “Uncovering local integrability in quantum many-body
[8] Youngseok Kim, Christopher J. Wood, Theodore J. Yo- dynamics,” arXiv:2307.07552 (2023).
der, Seth T. Merkel, Jay M. Gambetta, Kristan Temme, [26] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme,
and Abhinav Kandala, “Scalable error mitigation for Maika Takita, Markus Brink, Jerry M. Chow, and
noisy quantum circuits produces competitive expectation Jay M. Gambetta, “Hardware-efficient variational quan-
values,” Nature Physics 19, 752–759 (2023). tum eigensolver for small molecules and quantum mag-
[9] Youngseok Kim, Andrew Eddins, Sajant Anand, Ken nets,” Nature 549, 242–246 (2017).
Wei, Ewout Berg, Sami Rosenblatt, Hasan Nayfeh, Yan- [27] Daniel Bultrini, Max Hunter Gordon, Piotr Czarnik,
tao Wu, Michael Zaletel, Kristan Temme, and Abhinav Andrew Arrasmith, M. Cerezo, Patrick J. Coles, and
Kandala, “Evidence for the utility of quantum computing Lukasz Cincio, “Unifying and benchmarking state-of-the-
before fault tolerance,” Nature 618, 500–505 (2023). art quantum error mitigation techniques,” Quantum 7,
[10] Kristan Temme, Sergey Bravyi, and Jay M. Gam- 1034 (2023).
betta, “Error mitigation for short-depth quantum cir- [28] Diederik P. Kingma and Jimmy Ba, “Adam: A method
cuits,” Physical Review Letters 119, 180509 (2017). for stochastic optimization,” arXiv:1412.6980 (2015).
[11] Ying Li and Simon C. Benjamin, “Efficient variational [29] Rex Ying, Ruining He, Kaifeng Chen, Pong Eksom-
quantum simulator incorporating active error minimiza- batchai, William L. Hamilton, and Jure Leskovec,
tion,” Physical Review X 7, 021050 (2017). “Graph convolutional neural networks for web-scale rec-
[12] Yihui Quek, Daniel Stilck França, Sumeet Khatri, Jo- ommender systems,” in Proceedings of the 24th ACM
hannes Jakob Meyer, and Jens Eisert, “Exponentially SIGKDD International Conference on Knowledge Dis-
tighter bounds on limitations of quantum error mitiga- covery & Data Mining (2018) p. 974–983.
tion,” arXiv:2210.11505 (2022). [30] Patrick Reiser, Marlen Neubert, André Eberhard, Luca
[13] Ryuji Takagi, Hiroyasu Tajima, and Mile Gu, “Universal Torresi, Chen Zhou, Chen Shao, Houssam Metni, Clint
sampling lower bounds for quantum error mitigation,” van Hoesel, Henrik Schopmans, Timo Sommer, and
arXiv:2208.09178 (2022). Pascal Friederich, “Graph neural networks for materi-
[14] Changjun Kim, Kyungdeock Daniel Park, and June-Koo als science and chemistry,” Communications Materials 3
Rhee, “Quantum error mitigation with artificial neural (2022).
network,” IEEE Access 8, 188853–188860 (2020). [31] Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong,
[15] Piotr Czarnik, Andrew Arrasmith, Patrick J. Coles, and Wenjing Wang, and Yu Sun, “Masked label prediction:
Lukasz Cincio, “Error mitigation with clifford quantum- Unified message passing model for semi-supervised clas-
circuit data,” Quantum 5, 592 (2021). sification,” in Proceedings of the 13th International Joint
[16] Piotr Czarnik, Michael McKerns, Andrew T. Sornborger, Conference on Artificial Intelligence, IJCAI-21 (2021)
and Lukasz Cincio, “Improving the efficiency of learning- pp. 1548–1554.
based error mitigation,” arXiv:2204.07109 (2022). [32] Ekagra Ranjan, Soumya Sanyal, and Partha Pratim
[17] Elizabeth R. Bennewitz, Florian Hopfmueller, Bohdan Talukdar, “Asap: Adaptive structure aware pooling for
Kulchytskyy, Juan Carrasquilla, and Pooya Ronagh, learning hierarchical graph representations,” in AAAI
“Neural error mitigation of near-term quantum simula- Conference on Artificial Intelligence (2019).
tions,” Nature Machine Intelligence 4, 618–624 (2022). [33] Pedro Rivero, Friederike Metz, Areeq Hasan, Agata M.
[18] Tirthak Patel and Devesh Tiwari, “Qraft: Reverse your Brańczyk, and Caleb Johnson, “Zero noise extrapolation
quantum circuit and know the correct program out- prototype,” https://github.com/qiskit-community/
put,” in Proceedings of the 26th ACM International Con- prototype-zne (2022).
ference on Architectural Support for Programming Lan-
15
[34] Aram W. Harrow and Richard A. Low, “Random quan- on quantum machine learning,” Physical Review A 103
tum circuits are approximate 2-designs,” Communica- (2021).
tions in Mathematical Physics 291, 257–302 (2009).
[35] Haoran Liao, Ian Convy, William J. Huggins, and K. Bir-
gitta Whaley, “Robust in practice: Adversarial attacks