Professional Documents
Culture Documents
Channel Attention For Quantum Convolutional Neural Networks
Channel Attention For Quantum Convolutional Neural Networks
Gekko Budiutama,1, 2, ∗ Shunsuke Daimon,3, † Hirofumi Nishi,1, 2 Ryui Kaneko,4, 5 Tomi Ohtsuki,5 and Yu-ichiro Matsushita1, 2, 3
1
Laboratory for Materials and Structures, Institute of Innovative Research,
Tokyo Institute of Technology, Yokohama 226-8503, Japan
2
Quemix Inc., Taiyo Life Nihonbashi Building, 2-11-2, Nihonbashi Chuo-ku, Tokyo 103-0027, Japan
3
Quantum Materials and Applications Research Center,
National Institutes for Quantum Science and Technology, Tokyo, 152-8550, Japan
4
Waseda Research Institute for Science and Engineering, Waseda University, Shinjuku, Tokyo 169-8555, Japan
5
Physics Division, Sophia University, Chiyoda, Tokyo 102-8554, Japan
Quantum convolutional neural networks (QCNNs) have gathered attention as one of the most promising algo-
rithms for quantum machine learning. Reduction in the cost of training as well as improvement in performance
arXiv:2311.02871v1 [quant-ph] 6 Nov 2023
is required for practical implementation of these models. In this study, we propose a channel attention mecha-
nism for QCNNs and show the effectiveness of this approach for quantum phase classification problems. Our
attention mechanism creates multiple channels of output state based on measurement of quantum bits. This sim-
ple approach improves the performance of QCNNs and outperforms a conventional approach using feedforward
neural networks as the additional post-processing.
Quantum neural networks have emerged as one of the in the conventional QCNN models. A weight parameter is
promising approaches to perform machine learning tasks on then assigned to each channel indicating the difference in im-
noisy intermediate-scale quantum computers. Quantum neu- portance. Our results show that incorporating this straightfor-
ral networks employ a variational quantum ansatz, the pa- ward step leads to a remarkable reduction in the error of the
rameters of which are iteratively optimized using classical QCNN models for the task of quantum phase classification.
optimization routines [1–6]. Among various quantum neu- This improvement was achieved without major alteration to
ral network models, quantum convolutional neural networks the existing QCNN models and with only a minor increase in
(QCNNs) have been proposed as the quantum equivalent of the number of parameters.
the classical convolutional neural networks [7, 8]. Unlike As shown in Fig. 1 (a), conventional QCNNs are composed
other quantum neural networks [9–12], QCNNs do not ex- of convolutional and pooling layers. The convolutional layers
hibit the exponentially vanishing gradient problem (the bar- consist of parallel local unitary gates that can be either uni-
ren plateau problem) [13–15]. This guarantees QCNNs’ train- formly or independently parameterized within a single layer
ability under random initial parameterization. A recent study [7, 17] (U1 to U6 in Fig. 1 (a)). The pooling layers con-
has also demonstrated the viability of this approach on real sist of controlled rotational gates that act on a single qubit
quantum computers [16]. Due to these factors, QCNNs have (controlled-V1 and V2 in Fig. 1 (a)). Here, the control qubits
gained significant attention in the field of quantum machine on the pooling layer are discarded onwards, effectively reduc-
learning. They have demonstrated success in various learn- ing the number of qubits on the circuit. Following convolu-
ing tasks that involve quantum states as direct inputs, such tional and pooling layers, some of the qubits are then mea-
as quantum phase classification [16–19]. While having great sured as the final output of the model. From here on these
promise, reduction in training costs is essential for QCNNs’ qubits are called target qubits. Measuring these target qubits
practicality due to the current limitations and high expenses yields the prediction output:
of today’s quantum computer [20–25]. Furthermore, enhanc-
fi (|ψin i; θ) = hψQCNN |Pi |ψQCNN i, (1)
ing the performance of QCNNs is crucial for the efficacy of
this approach. To address these challenges, various hybrid where |ψin i, θ, |ψQCNN i represent the input state, the param-
QCNN-classical machine learning approaches have been re- eters of the QCNN, and the output state of the QCNN just be-
ported. Here, the outputs of the QCNNs are used as the input fore the measurement of the target qubits respectively. In this
to classical machine learning models. Fully connected feed- study, the observables {Pi }i=0,...,2Ntq −1 represent the projec-
forward neural networks are the most common classical mod- tion operators for the |iitq state, which act on the target qubits.
els utilized in these approaches [26–30]. However, the signifi- Ntq is the number of target qubits.
cant computational workload may become infeasible with the The proposed QCNN with channel attention is realized by
increasing complexity of tasks as well as QCNN models. performing measurement(s) on the control qubits of the pool-
In this work, we propose a channel attention scheme for ing layer (Fig. 1 (b)). From here on these qubits are called
QCNNs. Inspired by classical machine learning literature attention qubits. The measurement result is used to create
[31–38], the proposed attention for QCNNs creates channels multiple channels of output state. Different weights are given
of output state by the measurement of qubits that are discarded to each channel representing their importance. The weighted
outputs are summed and then subjected to a softmax function.
The mathematical formalization of the attention for the QC-
NNs is shown below.
∗ bgekko@quemix.com Among various states of attention qubits, the entangled
† daimon.shunsuke@qst.go.jp state collapses into a single state |φk iuq upon measurement.
2
FIG. 1. (a) A schematic illustration of the conventional QCNN. Here, U1 to U6 are parameterized gates that compose the convolutional layers.
Controlled-V1 and V2 are controlled rotational gates that create the pooling layer. Here, |ψin i, Pi , |ψQCNN i represent the input state, the
projection operators for the |iitq state, and the output state of the QCNN just before the measurement of the target qubits respectively. In the
text, this model is referred to as QCNN without channel attention. (b) The proposed channel attention for QCNNs. Channels are created based
on the measurement of attention qubits. The number of channels follows 2Naq where Naq is the number of attention qubits. In the figure
above, a model with Naq = 1 and Ntq = 2 is shown. Here, |kiaq , wk , |φ̃k iuq represent the state of attention qubits, weights of each channel,
and the state of unmeasured qubits just before the measurement of the target qubits.
Here, the measurement of attention qubits is represented as tor of length 2Ntq . An input state is classified into ith cate-
|kiaq and uq denotes unmeasured qubits in the pooling layer: gory when yi = 1. For a training dataset of size M , denoted
α
as {(|ψin i, y α )}α=1,...,M , the parameters θ of the QCNN are
2NX
aq −1
determined by minimizing a loss function. In this study, the
1
|φk iuq ⊗ |kiaq → |φk iuq ≡ |φ̃k iuq , (2) following log-loss function is utilized:
Nk
k=0
N
M 2 tq −1
where Nk = k|φk iuq k is the normalization constant. The 1 X X α α
L=− y log fi (|ψin i; θ). (5)
channels of states are determined by measuring the target M α=1 i=0 i
qubits based on the states of the attention qubits:
ch
The effect of attention on the performance of QCNNs was
fi,k (|ψin i; θ) = hφ̃k |Pi |φ̃k iuq . (3) investigated in the task of classification of quantum phases.
Firstly, we utilized QCNNs to classify quantum phases in 1D
Different weights wk ∈ R are given to each channel, repre-
many-body systems of spin-1/2 chains with open boundary
senting their importance. These weights are optimized dur-
conditions defined by the following Hamiltonian:
ing the training of the model. The final prediction data of the
QCNN with channel attention is normalized using the softmax N
X −2 N
X N
X −1
function: H = −gzxz Zi Xi+1 Zi+2 −gx Xi −gxx Xi Xi+1 ,
N
aq −1
i=1 i=1 i=1
2X
(6)
fich (|ψin i; θ, w) = softmax ch
wk fi,k (|ψin i; θ) , where Zi (Xi ) is the Pauli Z (X) operator for the spin at site i
k=0 and gzxz , gx , and gxx are the parameters of the Hamiltonian.
(4) Based on the given parameters, the ground state of the sys-
tem can manifest as a symmetry-protected topological (SPT),
where the softmax function is defined as softmax (xi ) = antiferromagnetic, or paramagnetic phase (Fig. 2 (a)) [7].
P2Ntq −1 xj
exi / j=0 e . The task is to determine whether a given ground state be-
For classification, QCNNs map an input |ψin i to a label longs to the SPT phase or not, and thus a binary classification
y, where y denotes a one-dimensional one-hot binary vec- problem. The training dataset was composed of 40 equally
3
FIG. 2. (a) The phase diagram of the Hamiltonian described in Eq. FIG. 3. (a) The phase diagram of the Hamiltonian described in Eq.
(6) as reported in literature [7] estimated using classical method. The (7) as reported in literature [39–41]. The classification of the quan-
classification of the quantum phases using QCNN without (b) and tum phases using QCNN without (b) and with channel attention (c).
with channel attention (c). Here, gx , gxx , and gzxz are the parameters Here, gzxz , gzz , and gx are the parameters of the Hamiltonian de-
of the Hamiltonian described in Eq. (6). Color gradation shows the scribed in Eq. (7). (d) The probability of being in the trivial phase
probability of being in the SPT phase. (d) The probability of being at gzz = 1.0 of QCNN without (black) and with channel attention
in the SPT phase at gx /gzxz = 1.04 of QCNN without (black) and (blue) at 300 epochs. (e) The training curves of QCNN without
with channel attention (color) at 300 epochs. (e) The training curves (black) and with channel attention (blue). The shaded region rep-
of QCNN without (black) and with channel attention (color). The resents the standard deviation for 10 random initial parameters.
shaded region represents the standard deviation for 10 random initial
parameters.
ACKNOWLEDGMENTS
In conclusion, we introduced a channel attention mecha-
nism for the quantum convolutional neural networks (QC- This work was supported by JSPS KAKENHI under
NNs). In our approach, channels of output state are created by Grant-in-Aid for Scientific Research No. 21H04553, No.
additional measurements of qubit(s) that are discarded in the 20H00340, and No. 22H01517, JSPS KAKENHI un-
conventional QCNN models. The importance of each channel der Grant-in-Aid for Transformative Research Areas No.
is then computed. Our results showed that these straightfor- JP22H05114, JSPS KAKENHI under Grant-in-Aid for Early-
ward steps led to a significant increase in the performance of Career Scientists No. JP21K13855, and by JST Grant Number
the QCNNs without any major alteration to the already exist- JPMJPF2221. This study was partially carried out using the
ing models. Integrating attention to QCNN models reduced TSUBAME3.0 supercomputer at the Tokyo Institute of Tech-
the classification error of quantum phase recognition prob- nology and the facilities of the Supercomputer Center, the In-
lems by at least 3 fold for the binary classification problem stitute for Solid State Physics, the University of Tokyo. The
and 6 fold for the ternary classification problem. Compari- author acknowledges the contributions and discussions pro-
son between the QCNN with channel attention with QCNN- vided by the members of Quemix Inc.
[1] E. Farhi and H. Neven, “Classification with quantum [18] C. Umeano, A. E. Paine, V. E. Elfving, and O. Kyriienko,
neural networks on near term processors,” (2018), “What can we learn from quantum convolutional neural net-
arXiv:1802.06002 [quant-ph]. works?” (2023), arXiv:2308.16664 [quant-ph].
[2] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, [19] S. Monaco, O. Kiss, A. Mandarino, S. Vallecorsa, and
Phys. Rev. A 98, 032309 (2018). M. Grossi, Phys. Rev. B 107, L081105 (2023).
[3] M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe, [20] A. Perdomo-Ortiz, M. Benedetti,
Phys. Rev. A 101, 032308 (2020). J. Realpe-Gómez, and R. Biswas,
[4] M. Cerezo, A. Arrasmith, R. Babbush, S. C. Benjamin, S. Endo, Quantum Science and Technology 3, 030502 (2018).
K. Fujii, J. R. McClean, K. Mitarai, X. Yuan, L. Cincio, and [21] N. Moll, P. Barkoutsos, L. S. Bishop, J. M. Chow, A. Cross,
P. J. Coles, Nature Reviews Physics 3, 625 (2021). D. J. Egger, S. Filipp, A. Fuhrer, J. M. Gambetta, M. Ganzhorn,
[5] Y. Du, Y. Yang, D. Tao, and M.-H. Hsieh, et al., Quantum Science and Technology 3, 030503 (2018).
Phys. Rev. Lett. 131, 140601 (2023). [22] A. D. Córcoles, A. Kandala, A. Javadi-Abhari, D. T. McClure,
[6] N. Killoran, T. R. Bromley, J. M. Arrazola, M. Schuld, N. Que- A. W. Cross, K. Temme, P. D. Nation, M. Steffen, and J. M.
sada, and S. Lloyd, Phys. Rev. Res. 1, 033063 (2019). Gambetta, Proceedings of the IEEE 108, 1338 (2020).
[7] I. Cong, S. Choi, and M. D. Lukin, [23] A. Wack, H. Paik, A. Javadi-Abhari, P. Jurcevic, I. Faro, J. M.
Nature Physics 15, 1273 (2019). Gambetta, and B. R. Johnson, “Quality, speed, and scale: three
[8] A. V. Uvarov, A. S. Kardashin, and J. D. Biamonte, key attributes to measure the performance of near-term quan-
Phys. Rev. A 102, 012415 (2020). tum computers,” (2021), arXiv:2110.14108 [quant-ph].
[9] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, and [24] T. Lubinski, S. Johri, P. Varosy, J. Coleman, L. Zhao,
H. Neven, Nature Communications 9, 4812 (2018). J. Necaise, C. H. Baldwin, K. Mayer, and T. Proctor,
[10] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J. Coles, IEEE Transactions on Quantum Engineering 4, 1 (2023).
Nature Communications 12, 1791 (2021). [25] V. Russo, A. Mari, N. Shammah, R. LaRose, and W. J. Zeng,
[11] C. Ortiz Marrero, M. Kieferová, and N. Wiebe, IEEE Transactions on Quantum Engineering , 1 (2023).
PRX Quantum 2, 040316 (2021). [26] M. Broughton, G. Verdon, T. McCourt, A. J. Martinez,
[12] E. R. Anschuetz and B. T. Kiani, J. H. Yoo, S. V. Isakov, P. Massey, R. Halavati, M. Y.
Nature Communications 13, 7760 (2022). Niu, A. Zlokapa, et al., “Tensorflow quantum: A soft-
[13] A. Pesah, M. Cerezo, S. Wang, T. Volkoff, A. T. Sornborger, ware framework for quantum machine learning,” (2021),
and P. J. Coles, Physical Review X 11, 041011 (2021). arXiv:2003.02989 [quant-ph].
[14] C. Zhao and X.-S. Gao, Quantum 5, 466 (2021). [27] K. Sengupta and P. R. Srivastava,
[15] E. Cervero Martı́n, K. Plekhanov, and M. Lubasch, BMC Medical Informatics and Decision Making 21, 227 (2021).
Quantum 7, 974 (2023). [28] A. Sebastianelli, D. A. Zaidenberg,
[16] J. Herrmann, S. M. Llima, A. Remm, P. Zapletal, N. A. McMa- D. Spiller, B. Le Saux, and S. Ullo,
hon, C. Scarato, F. Swiadek, C. K. Andersen, C. Hellings, IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 15, 565 (2022).
S. Krinner, et al., Nature Communications 13, 4144 (2022). [29] W. Li, P.-C. Chu, G.-Z. Liu, Y.-B. Tian, T.-H. Qiu, and S.-M.
[17] Y.-J. Liu, A. Smith, M. Knap, and F. Pollmann, Wang, Quantum Engineering 2022, 1 (2022).
Physical Review Letters 130, 220603 (2023). [30] S.-Y. Huang, W.-J. An, D.-S. Zhang, and N.-R. Zhou,
6