Professional Documents
Culture Documents
QSVMM
QSVMM
QSVMM
We propose a quantum algorithm for support matrix machines (SMMs) that efficiently addresses an image
classification problem by introducing a least-squares reformulation. This algorithm consists of two core
subroutines: a quantum matrix inversion (Harrow-Hassidim-Lloyd, HHL) algorithm and a quantum singular
value thresholding (QSVT) algorithm. The two algorithms can be implemented on a universal quantum computer
with complexity O[log (npq)] and O[log (pq)], respectively, where n is the number of the training data and pq
is the size of the feature space. By iterating the algorithms, we can find the parameters for the SMM classfication
model. Our analysis shows that both HHL and QSVT algorithms achieve an exponential increase of speed over
their classical counterparts.
DOI: 10.1103/PhysRevA.96.032301
032301-2
QUANTUM ALGORITHM FOR SUPPORT MATRIX MACHINES PHYSICAL REVIEW A 96, 032301 (2017)
B. Algorithm
Our quantum algorithm for SMMs is based on the following
model. Assume that the matrix A = [aij ] ∈ Rp×q is given by
a quantum oracle OA :
OA |i|j |0 = |i|j |aij , ∀i ∈ {1, . . . ,p}, ∀j ∈ {1, . . . ,q}.
(10)
In addition, we assume that the vector y = [yi ] ∈ Rn is given
by a quantum oracle Oy :
Oy |i|0 = |i|yi , ∀i ∈ {1, . . . ,n}. (11)
That is, given the indexes i and j , the oracles OA and Oy can
FIG. 2. Loss functions in a SMM. The blue dashed line represents
return the values of A and y, respectively.
the hinge loss
hinge in the original objective function and the red solid Then, via the method outlined in Ref. [22], the following
line the square loss
square used in our method. Here we introduce a quantum state can be generated:
new parameter μ to make
hinge upper bounded by
square strictly.
p
q
|ψA = aij |i|j , (12)
Theorem 1. There exists a square loss function i=1 j =1
square (ei ) := (1 − ei − μ)2 ,
p
|ψy = yi |i. (13)
which is a convex surrogate for the hinge loss: i=1
hinge (ei ) := max {0,ei }. Note that our algorithm works with the normalized data,
namely matrices and vectors, as these data in many machine
Using the surrogate loss transforms the objective function learning algorithms are normalized.
(3) to the following problem: Based on this model, our algorithm consists of two
η
n subroutines: a quantum matrix inversion (HHL) algorithm for
1
arg min tr(W W) +
T (k)T W)
(1 − ei − μ)2 − tr( estimating W(k) and b(k) and a QSVT algorithm for estimating
W,b,ei 2 2 i=1
S(k) .
ρ The overall procedure of our algorithm includes the
+ W − S(k) 2F ,
2 following steps.
such that yi [tr(WT Xi ) + b] = 1 − ei , (6) Algorithm. (W,b) = QSMM(,S,X,y).
(1) Initialize k = 1, apply the quantum algorithm
where η is the regularization parameter, which plays the same (b(k) ,α̃ (k) ) = HHL(
(k) ,F), where (
(k) ,F) can be prepared in
role as the parameter C in Eq. (1). terms of (X,y, (k) ,
S(k) ) in Eq. (9).
Theorem 2. One of the solutions of the problem (6) is (2) Extract b and α̃ (k) to construct the matrix W(k)
(k)
according to Eq. (7).
1 n
W =
(k)
+ ρ
(k)
S +
(k) (k)
αi yi Xi , (7) (3) Apply the quantum algorithm S(k) = QSVT(A(k) ,τ ),
ρ+1 i=1 where A(k) is a normalized r-rank approximate of
ρW(k) − (k) .
and the offset b(k) and α (k) = [αi (k) ] ∈ Rn are the solutions of (4) Update the parameters (k+1) and
S(k+1) and set k =
the following linear equation: k + 1.
(k) (k)
b 0 1T b 0 (5) Repeat Steps 1–4 until the number of iterations k =
F (k) ≡ = , (8) max iter; then, one can obtain the regression matrix W and the
α̃ 1 K + I/γ α̃ (k)
(k)
offset b.
where K = [Kij ] ∈ Rn×n and
(k) = [
i (k) ] ∈ Rn are inde- We begin to introduce the quantum algorithms (b,α̃) =
pendent of α̃ (k) = [αi(k) yi ] ∈ Rn ; specifically, HHL(
,F) and S = QSVT(A,τ ) in detail in the following
subsections.
tr XTi Xj
Kij = ,
ρ+1 1. HHL algorithm
(k)
T
(k) tr[ + ρ S(k) Xi ] To obtain W (k)
in Eq. (7), we should solve the normal-
i = μyi − . (9) (k) T T
ized (b ,α̃ ) = F−1 (0,
(k) ) , where F 1. The HHL
(k)
ρ+1
032301-3
BOJIA DUAN, JIABIN YUAN, YING LIU, AND DAN LI PHYSICAL REVIEW A 96, 032301 (2017)
algorithm was proposed for solving matrix inversion problems Remove the register C and measure the register R to be |1.
[7]. But the matrix should be s-sparse in Ref. [7]. In this paper, We have the state proportional to
we adopted a strategy in Ref. [5], therefore the HHL algorithm
can be used when the matrix is nonsparse. The detailed
n+1
description is shown in Appendix C. The core subroutine of |ψ4 = ui |ψ
/λi |ui . (20)
the algorithm is phase estimation (denoted UPE ). Apply UPE i=1
to the registers C and B, which is represented as follows:
(5) Finally, represent the state in the basis of training
UPE = UPE (A) set labels to acquire the desired parameters that exist in the
T −1 coefficients of the state [8]:
†
= (FT ⊗IB ) |τ τ | ⊗e
C iAτ t0 /T
(H⊗t ⊗IB ), (14) n
1
τ =0 |b,α̃ = √ b|0 + α̃i |i , (21)
l i=1
where register C with t qubits is used to store the estimated
eigenvalues of an Hermite matrix A, register B with s qubits
with l = b2 + ni=1 α̃i2 .
†
stores the input state |ψ, FT is the inverse quantum Fourier Here, amplitude estimation can then be used to extract b(k)
T −1
τ =0 |τ τ | ⊗ e
transform, and C iAτ t0 /T
is the conditional and α̃ (k) , and now we can obtain the matrix W(k) according to
Hamiltonian evolution [7]. Eq. (7).
We now give the HHL algorithm for solving the refor- In summary, the HHL algorithm can be represented as the
mulated SMM to estimate the parameters W(k) and b(k) . In following unitary operation:
Appendix C, we show how to prepare the input |ψ
and
†
R
e−iFt0 . UHHL = (IR ⊗ UPE ) RRC
f ⊗ I (I ⊗ UPE ),
B
(22)
Input. A quantum state |ψ
and a unitary e−iFt0 .
Output. A quantum state proportional to |b,α̃ ≈ F−1 |ψ
. where RRC f is given in Eq. (17).
Algorithm. (b,α̃) = HHL(
,F). Analysis. We now discuss the error of the HHL algorithm.
(1) Prepare three quantum registers: register C for storing Define the condition number of F as κ, and t0 = O(κ/ε).
the estimated eigenvalues |λi of F, register B for storing Therefore, only the eigenvalues in the interval 1/κ |λj | 1
the input state |ψ
, and register R with an ancilla qubit for are taken into account. The estimation of eigenvalue λ̃j :=
storing the inverse of the eigenvalues λ−1 i of F. Prepare the 2πj /t0 in step (2) introduces error O(1/t0 ) = O(ε/κ) which
three registers in the state translates into error O(ε/(λκ)) in λ−1 in step (3) [7]. With
λ 1/κ taking t0 = O(κ/ε) induces a final error of O(ε).
|ψ0 = |0R (|0|0 · · · |0)C (|ψ
)B , (15) We now analyze the time complexity. In phase estimation,
O(t 2 ) operations and one call to the conditional Hamiltonian
where |ψ
= |0,
= n+1 −1
i=1 φi |i. evolution ( Tτ =0 |τ τ |C ⊗ eiFτ t0 /T ) are needed, where t is
(2) Perform the unitary operation UPE (F) on the state. the number of qubits in register C [4]. The preparation of the
UPE (F) expands |ψ
into the eigenstates |ui of F with unitary e−iFt0 costs time O[t02 ε−1 log (npq)] which is shown
corresponding eigenvalues λi and we have the state in Appendix C. As t decides the accuracy of the estimated
eigenvalues, t is not very large to some extent. When the size
n+1
of the training data and feature space is large, the preparation
|ψ1 = |0R ui | ψ
|λi C |ui B . (16)
of the unitary e−iFt0 dominates the time complexity of phase
i=1
estimation.
(3) Apply a controlled rotation RRC The probability of obtaining λ−1 determines the number of
f ,
iterations of the algorithm. This probability is determined by
⎛
⎞R the amplitude square γ /λi , and this value is at least (1/κ 2 )
γ γ2 ⎠ with γ = O(1/κ) and λ 1. Hence, O(κ 2 ) repetitions are
|0 |z → ⎝ |1 +
R C
1 − 2 |0 |zC , (17)
needed to ensure success with high probability. By virtue of the
z z
amplitude amplification algorithm [15], only O(κ) repetitions
are sufficient to achieve a constant success probability of the
to the register R, controlled by the register C, where γ is a
postselection process.
constant. This rotation transforms the state to
In conclusion, the total evolution time of the algo-
R n+1 rithm is O[t02 ε−1 log (npq)]O(κ) = O[κ 3 ε−3 log (npq)]. In
γ2 γ
|ψ2 = 1− 2 |0+ |1 ui | ψ
|λi C |ui B . (18) contrast, the matrix inversion via classical method needs time
λi λi i=1 O[poly(npq)].
(4) Uncompute the registers C and B to obtain the state 2. QSVT algorithm
R Herein we propose a QSVT algorithm for estimating
γ2 γ n+1
|ψ3 = 1 − 2 |0 + |1 (|0 · · · 0)C ui | ψ
|ui B . S(k) . Suppose A(k) is a normalized matrix that is a good
λi λi i=1 r-rank approximate of ρW(k) − (k) , with A(k) 1. In the
(19) following, we simplify the notation A(k) as A. Then, using the
032301-4
QUANTUM ALGORITHM FOR SUPPORT MATRIX MACHINES PHYSICAL REVIEW A 96, 032301 (2017)
|ψA = aij |i|j = σk |uk |vk , (23) |ψ3 = (σk − τ )|uk |vk = |ψS . (29)
i=1 j =1 k=1 k=1
where r is the rank of A, and σk are just the singular values In summary, the QSVT algorithm can be represented as the
of A, with uk and vk being the left and right singular vectors. following unitary operation:
Note that uk are just the eigenvectors of AA† and vk are the †
R
eigenvectors of A† A. The eigenvalues of AA† or A† A are UQSVT = (IR ⊗ UPE ) RRC
f
⊗ I (I ⊗ UPE ),
B
(30)
both λk = σk2 . Based on quantum Random Access Memories
(QRAM), the preparation of |ψA costs time O[log (pq)]. where RRC f
is proposed in Eq. (27).
By taking a partial trace of |ψA ψA |, we can obtain a mixed Analysis. We now discuss the error of the QSVT algorithm.
state ρAA† [10]: Define the condition number of ρAA† as κ, and t0 = O(κ/ε).
Therefore, only the eigenvalues σk2 in the interval 1/κ σk2
p
q 1 are taken into account. The estimation of eigenvalues σ̃k2 :=
ρAA† = trj (|ψA ψA |) = aij ai∗
j |i i
|, (24) 2π k/t0 in step (2) introduces error O(1/t0 ) = O(ε/κ). This
i,i
=1 j =1 error translates into error O[ε/(σk κ)] in (1 − τ/σk ) after the
controlled rotation RRC f
in step (3). With σk2 1/κ taking
where ρAA† represents the positive Hermitian matrix AA† . √
t0 = O(κ/ε) induces a final error of O( ε/κ).
The QSVT algorithm is now presented as follows:
We now discuss the time complexity of the QSVT algo-
Input. A quantum state |ψA , a unitary e−iρAA† t0 , and a
rithm. Clearly, the preparation of the unitary operation e−iρAA† t0
constant τ .
dominates the time complexity of phase estimation, and it costs
Output. A quantum state proportional to |ψS =
r
time O[t02 ε−1 log (pq)]. The probability of obtaining (σk − τ )
k=1 (σk − τ )|uk |vk . determines the number of iterations of the algorithm, and it is
Algorithm. S = QSV T (A,τ ). determined by the amplitude square γ (σk − τ )/σk . This value
(1) Prepare three quantum registers in the state is at least (1/κ 2 ) with γ = O(1/κ) and σk 1. Hence, O(κ 2 )
repetitions are needed to ensure success with high probability.
|ψ0 = |0R (|0|0 · · · |0)C (|ψA )B . (25)
Using amplitude amplification [15], to achieve a constant
(2) Perform the unitary operation UPE (ρAA† ) on the state. success probability of the postselection process only needs
According to the ideas of quantum principal component O(κ) repetitions. Therefore, the total evolution time of the
analysis in Refs. [5,10], we have the state algorithm is O[t02 ε−1 log (pq)]O(κ) = O[κ 3 ε−3 log (pq)]. In
contrast, the matrix inversion via the classical method needs
r
C time O[poly(pq)].
|ψ1 = |0R σk σk2 |uk |vk . (26)
k=1
C. Classification
(3) Apply a controlled rotation RRCf
to the register R, With the parameters W and b, we can classify a query state
RC
controlled by the register C. Rf
is defined as follows: if X via the method presented in Ref. [8]. First, we construct the
σj > τ , training-data state via quantum oracles:
⎛ ⎞
|0R |zC
1 ⎝
⎛
⎞R |ψt̃ = √ b|0|0 + |0|λ + α̃i |i|xi ⎠, (31)
√ √
2 ( z − τ )2 Nt
γ ( z − τ ) γ α̃ y >0
→⎝ |0⎠ |zC ;
i i
√ |1 + 1 −
z z and the query state
(27)
otherwise do nothing. 1 n
|ψx̃ = √ |0|0 + |i|x , (32)
This rotation transforms the state to Nx i=0
R
γ (σk − τ ) γ 2 (σk − τ )2 where Nt and Nx are the norms of these states. The inner
|ψ2 = |1 + 1 − |0 product of the two states
σk σk2
⎛ ⎞
r
C 1
⊗ σk σk2 |uk |vk ψt̃ ψx̃ = √ ⎝b + λ | x + α̃i xi | x⎠ (33)
Nt Nx α̃ y >0
k=1 i i
r
C reveals the label of the classification problem
+ |0R σk σk2 |uk |vk . (28) ⎡⎛ ⎤
⎞T
k=r
+1
⎢ ⎥
(4) Uncompute the registers C and B, remove the register yj = sgn⎣⎝λ + αi xi ⎠ x + b⎦.
(34)
C, and measure the register R to be |1. Then, we have the
αi yi >0
032301-5
BOJIA DUAN, JIABIN YUAN, YING LIU, AND DAN LI PHYSICAL REVIEW A 96, 032301 (2017)
more fruitful results in quantum machine learning. = vec((k)T + ρ S(k)T )T vec XTi
:= λ(k)
xi + ρ s |xi ,
(k)
ACKNOWLEDGMENTS (C1)
This work was supported by the National Natural Sci- where {|xi }ni=1 , |s(k) , and |λ(k)
are the quantum states in the
ence Foundation of China (Grants No. 61571226 and No. basis of the training set space. Here, we have
61701229), the Natural Science Foundation of Jiangsu (k) (k)
Province, China (Grant No. BK20140823), the Jiangsu 1 n
λ xi +ρ s xi
Innovation Program for Graduate Education (Grant No. |ψ
= |0+ μyi − |i ,
Nψ
i=1
ρ+1
KYLX15_0326), and the Fundamental Research Funds for
the Central Universities. (C2)
APPENDIX A: PROOF OF THEOREM 1 with Nψ
being the norms of |ψ
. Based on QRAM [11],
a swap test can be used to evaluate these inner products.
Proof. Obviously, for all the ei ,
hinge (ei )
square (ei ) when Therefore, the preparation of |ψ
costs time O[log (npq)]
μ 5/4. Then, the square loss is the upper bound of the hinge [12].
loss. Along with the convexity of the square function,
square (ei ) Preparation of the input e−iFt0 of the HHL algorithm. The
meets the surrogate property [27]. unitary e−iFt0 can be simplified to the direct preparation and
exponentiation of the matrix K [8]. To prepare the matrix K
APPENDIX B: PROOF OF THEOREM 2 with elements
Proof. To solve Eq. (6), we construct the Lagrange function
T
032301-6
QUANTUM ALGORITHM FOR SUPPORT MATRIX MACHINES PHYSICAL REVIEW A 96, 032301 (2017)
n
of the matrix K with error O(t 2 ): |p q|⊗|q p|.
p,q=1 Thus, using T = O(t02 ε−1 )
−1 copies of K to implement e−iKt0 to accuracy ε is in time
e−iFt = e−itγ I −itJ −itK
e e + O(t 2 ), O[t02 ε−1 log (npq)].
T Preparation of the input e−iρAA† t0 of the QSVT algorithm.
with J = (01 10 ). As K in Eq. (C3) is Hermitian, the The unitary e−iρAA† t0 can also be simulated via O(t02 ε−1 )
exponentiation of the matrix K can be prepared by applying copies of e−iρAA† t for the time slice t: e−iρAA† t (σ ) ≈
the algorithm in Ref. [5] in terms of a density matrix tr1 {e−iSwap t ρAA† ⊗ σ eiSwap t }. As the preparation of |ψA is
description with error O(t 2 ): e−iKt (σ ) ≈ tr1 {e−iSwap t O[log (pq)], the time complexity of implementing e−i ρAA† t0
K ⊗ σ eiSwap t } = σ − it[K,σ ] + O(t 2 ), where Swap = to accuracy ε is O[t02 ε−1 log (pq)].
[1] J. Adcock, E. Allen, M. Day, S. Frick, J. Hinchliff, M. [15] G. Brassard, P. Hoyer, M. Mosca, and A. Tapp, Contemp. Math.
Johnson, S. Morleyshort, S. Pallister, A. Price, and S. Stanisic, 305, 53 (2002).
arXiv:1512.02900v1. [16] E. Aimeur, G. Brassard, and S. Gambs, Mach. Learn. 90, 261
[2] M. Schuld, I. Sinayskiy, and F. Petruccione, Contemp. Phys. 56, (2013).
172 (2014). [17] S. Lloyd, M. Mohseni, and P. Rebentrost, arXiv:1307.0411v2.
[3] P. Wittek, Quantum Machine Learning What Quantum Comput- [18] N. Wiebe, A. Kapoor, and K. Svore, Quantum Inf. Comput. 15,
ing Means to Data Mining (Elsevier, Singapore, 2014). 0318 (2015).
[4] M. A. Nielsen and I. L. Chuang, in Quantum Computation and [19] I. Kerenidis and A. Prakash, arXiv:1603.08675v3.
Quantum Information: 10th Anniversary Edition (Cambridge [20] H. K. Lau, R. Pooser, G. Siopsis, and C. Weedbrook, Phys. Rev.
University Press, Cambridge, England, 2010), pp. 1–59. Lett. 118, 080501 (2017).
[5] S. Lloyd, M. Mohseni, and P. Rebentrost, Nat. Phys. 10, 108 [21] N. Wiebe, D. Braun, and S. Lloyd, Phys. Rev. Lett. 109, 050505
(2014). (2012).
[6] P. Rebentrost, A. Steffens, and S. Lloyd, arXiv:1607.05404v1. [22] A. Prakash, Ph.D thesis, Dissertations and Theses - Gradworks,
[7] A. W. Harrow, A. Hassidim, and S. Lloyd, Phys. Rev. Lett. 103, 2014.
150502 (2009). [23] I. Cong and L. Duan, New J. Phys. 18, 073011 (2016).
[8] P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev. Lett. 113, [24] C. H. Yu, F. Gao, Q. L. Wang, and Q. Y. Wen, Phys. Rev. A 94,
130503 (2014). 042311 (2016).
[9] Y. Liu and S. Zhang, Theor. Comput. Sci. 657, 38 (2016). [25] L. Luo, Y. Xie, Z. Zhang, and W. J. Li, in Proceedings of the
[10] M. Schuld, I. Sinayskiy, and F. Petruccione, Phys. Rev. A 94, 32nd International Conference on Machine Learning, JMLR
022342 (2016). Workshop and Conference Proceedings 37 (JMLR.org, Lille,
[11] V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. Lett. 100, France, 2015), pp. 938–947.
160501 (2008). [26] J. F. Cai, E. J. Candès, and Z. Shen, SIAM J. Optim. 20, 1956
[12] H. Buhrman, R. Cleve, J. Watrous, and R. de Wolf, Phys. Rev. (2008).
Lett. 87, 167902 (2001). [27] S. Shalev-Shwartz and S. Ben-David, Understanding Machine
[13] L. K. Grover, Phys. Rev. Lett. 79, 325 (1997). Learning: From Theory to Algorithms (Cambridge University
[14] C. Durr and P. Hoyer, arXiv:quant-ph/9607014v2 (1996). Press, Cambridge, England, 2014).
032301-7