QSVMM

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

PHYSICAL REVIEW A 96, 032301 (2017)

Quantum algorithm for support matrix machines


Bojia Duan, Jiabin Yuan, Ying Liu, and Dan Li
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics,
No. 29 Jiangjun Avenue, 211106 Nanjing, China
(Received 25 March 2017; published 1 September 2017)

We propose a quantum algorithm for support matrix machines (SMMs) that efficiently addresses an image
classification problem by introducing a least-squares reformulation. This algorithm consists of two core
subroutines: a quantum matrix inversion (Harrow-Hassidim-Lloyd, HHL) algorithm and a quantum singular
value thresholding (QSVT) algorithm. The two algorithms can be implemented on a universal quantum computer
with complexity O[log (npq)] and O[log (pq)], respectively, where n is the number of the training data and pq
is the size of the feature space. By iterating the algorithms, we can find the parameters for the SMM classfication
model. Our analysis shows that both HHL and QSVT algorithms achieve an exponential increase of speed over
their classical counterparts.

DOI: 10.1103/PhysRevA.96.032301

I. INTRODUCTION petabytes), a SMM may take a significant amount of time to


execute.
As an emerging interdisciplinary field, quantum machine
In this paper we propose a quantum algorithm for SMMs.
learning (QML) aims to aid the learning process via quantum
Schematically, for learning the parameters of the SMM
computing and quantum information theory [1–4]. A range of
classification model, we first introduce the least-squares
quantum machine learning algorithms can offer an increase of
method to re-express the quadratic programming program
speed over their classical counterparts. One of these quantum
approximately to the linear programming program. This
algorithms is phase estimation (PE), which can be used to
reformulation allows for a quantum solution with the quantum
solve matrix computing problems, namely eigenvalue decom-
matrix inversion (HHL) algorithm. In addition, we propose
position or singular value decomposition of a matrix, or matrix
a quantum singular value thresholding (QSVT) algorithm for
inversion. It can help a variety of machine learning algorithms
updating the substitution of the regression matrix. Finally, a
achieve an exponential speed increase, such as quantum princi-
swap-test operation can then help to classify a query state. We
pal component analysis [5], quantum singular value decompo-
have analyzed the complexity of our algorithm, and it shows
sition (QSVD) [6], and a quantum matrix inversion algorithm
that the two core subroutines (HHL and QSVT algorithms)
(Harrow-Hassidim-Lloyd, HHL) for solving linear systems
can be implemented in time O[log (npq)] and O[log (pq)],
of equations [7]. On the basis of these algorithms proposed
respectively, achieving an exponential acceleration compared
in Refs. [5–7], quantum support vector machines, quantum
with its classical counterparts.
algorithms for least-squares regression and statistical leverage
In detail, there are two contributions in our work. First, we
scores, and pattern classification with linear regression have
introduced a least-squares reformulation in Sec. III A. Without
been presented [8–10]. In addition, other quantum algorithms
this reformulation, the original problem could not be solved by
also show good performance, such as quantum random access
the HHL algorithm directly. Second, we proposed a quantum
memory, which requires only a logarithmic memory call [11];
algorithm (QSVT) for solving singular value thresholding
swap tests, which can solve inner product exponentially faster
(SVT) in Sec. III B 2. The QSVT algorithm achieves an
[12]; and generic Grover’s algorithms, which can solve search
exponential speedup compared to the classical counterparts.
problems with a quadratic speed increase [13–15]. These algo-
The remainder of the paper is organized as follows: We
rithms can also accelerate many machine learning procedures
give a brief overview of the SMM algorithm in Sec. II, put
[16–24].
forward our quantum algorithm for SMMs and analyze the
Herein, we focus our attention on support matrix machine
time complexity in Sec. III, and present our conclusions in
(SMMs), an important model proposed in 2015 to address
Sec. IV.
an image classification problem in machine learning [25].
Given a large number of classified training matrices, the II. REVIEW OF SMMS
task of a SMM is to classify a new matrix into one of In this section, we briefly review some basic notations of
two classes. For example, a SMM can be used to predict SMMs and describe the algorithmic procedures for the SMM
whether electroencephalogram (EEG) emotion is positive or learning process. More detailed information can be found in
negative in EEG classification, or whether a person is male Ref. [25].
or female in face classification. The core of a SMM is to
preserve the structure information of the original feature A. Notations
matrices, which helps to increase the robustness and accuracy Let In denote the n × n identity matrix. For any ma-
 A2 ∈ R , 2the Frobenius norm is defined as AF =
of the classification. The time complexity of the learning p×q 2
trix
ij aij =
algorithm of SMMs is proportional to O[poly(n,pq)], with i σi , where aij are the (i,j )th elements of
n being the number of training data and pq the dimension A and σi are the singular values. Given the matrix A ∈
of the feature space. However, when the size of the training Rp×q with rank r  min (p,q), let the thin singular
 value
set and/or the feature space is large (e.g., terabytes or even decomposition (SVD) of A be A = UVT = ri=1 σi ui vTi ,

2469-9926/2017/96(3)/032301(7) 032301-1 ©2017 American Physical Society


BOJIA DUAN, JIABIN YUAN, YING LIU, AND DAN LI PHYSICAL REVIEW A 96, 032301 (2017)

where U = [u1 , . . . ,ur ] ∈ Rp×r and V = [v1 , . . . ,vr ] ∈ Rq×r


satisfy UT U = VT V = Ir , and  = diag(σ1 , . . . ,σr ), with
σ1  · · ·  σr > 0, are the singular values of A. For any τ >
0, let the singular value thresholding (SVT) of A be Dτ (A) =
UDτ ()VT , where Dτ () = diag[(σ1 − τ )+ , . . . ,(σr − τ )+ ]
and (σi − τ )+ = max (σi − τ,0) [26]. Obviously, Dτ (A) =

i:σi >τ (σi − τ )ui vi .
T

To convert a matrix into a column vector, let the


vectorization of the matrix A be vec(AT ) = (a11 , . . . ,
 
a1q ,a21 , . . . ,apq )T = a ∈ Rp×q . Similarly, there are xi =
n   
vec(XTi )i=1 , w = vec(WT ), s = vec(ST ), and λ = vec(T ).

B. Support matrix machines


A SMM algorithm aims at classifying a regularized
feature matrix into one of two classes. The ingredients
of a SMM are a training set of already classified data
and a learning algorithm. With the given training set M =
n
{(Xi ,yi ) : Xi ∈ Rp×q ,yi = ±1}i=1 , where Xi represented in
matrix form is the ith input sample and yi indicates the
class which Xi belongs to, the learning algorithm outputs the FIG. 1. Entire process of SMM learning algorithm.
parameters of the classification model: the regression matrix
W ∈ Rp×q and the offset term b ∈ R. Then, the class label of In Ref. [25] it is shown that the update equation of S(k) can
a new data X can be predicted by virtue of these parameters: be obtained by singular value thresholding [26]:
y = sgn[tr(WT X) + b].
1 1
Formally, the matrix classification model SMM is defined S(k) = (k) ) = U0 (0 − τ I)VT0 ,
Dτ (ρW(k) −  (5)
as follows: ρ ρ
1  n where U0 and V0 are matrices of the left and right singular
arg min tr(WT W) + C {1 − yi [tr(WT Xi ) + b]}+ vectors of ρW(k) −  (k) , with the corresponding diagonal
W,b 2 i=1 matrix whose diagonal values are greater than τ .
+ τ W∗ , (1) In summary, the ultimate target of a SMM learning
algorithm is to gain the regression matrix W and the offset b.
where {1 − yi [tr(W Xi ) + b]}+ is the hinge loss function and
T
This process of learning also involves an additional updating
τ W∗ the penalty function. of substitution S.
To solve the objective function, Luo et al. [25] first
rewrote Eq. (1) by introducing the constraint S − W = 0 and
III. QUANTUM SMM
G(S) = τ S∗ (where S can be regarded as a substitution
for W) and then applied the learning algorithm based on In this section, we design a quantum algorithm for a SMM
the ADMM (alternating direction method of multipliers). based on a reformulation of the original SMM classification
The ADMM solves the problem by using the augmented model. First, we show how to transform Eq. (3) to a linear
Lagrangian function: problem that the quantum matrix inversion algorithm can
address directly. Then, we sketch the basic idea of our quantum
L(W,b,S,) = H (W,b) + G(S) + tr[T (S − W)]
algorithm for a SMM and present in detail the key procedures
ρ
+ S − W2F , (2) of the algorithm. At the same time, we analyze its time
2 complexity.
where ρ is a hyperparameter.
The overall procedure of the SMM learning algorithm is A. Least-squares reformulation
depicted in Fig. 1. In the kth iteration of the algorithm, the
The quantum algorithm has shown good performance in
updating of the assistant parameters  (k+1) costs
S(k+1) and 
solving the linear equation. In order to transform the original
O(1), which is much less costly than the first two procedures.
constraint quadratic programming problem in Eq. (3) to
Therefore, the core of the SMM learning process is the
the solution of a linear equation system, we introduce the
computations of (W(k) ,b(k) ) and S(k) :
least-squares reformulation of the original SMM classification
(k)T W) model. Here, we take a surrogate loss function, namely the
(W(k) ,b(k) ) = arg min H (W,b) − tr( square loss, instead of the hinge loss in Eq. (1).
W,b
First, we introduce a slack variable ei = 1 −
ρ
+W −  S(k) 2F , (3) [tr(WT Xi ) + b], ei ∈ R to substitute thesquare loss
yi 
2 n n
i=1 (1 − ei − μ) for the hinge loss C i=1 {ei }+ in
η 2
ρ 2
S(k) (k)T S) + W(k) − S2F .
= arg min G(S) + tr( (4) Eq. (1), as shown in Fig. 2. This naturally replaces the original
S 2 inequality constraints with equality constraints.

032301-2
QUANTUM ALGORITHM FOR SUPPORT MATRIX MACHINES PHYSICAL REVIEW A 96, 032301 (2017)

The proofs of the two theorems are shown in Appendices A


and B.
By Theorem 2, updating (W(k) ,b(k) ) can be done by solving
Eq. (8). This simplification fits the situation in which the
quantum matrix inversion algorithm works.

B. Algorithm
Our quantum algorithm for SMMs is based on the following
model. Assume that the matrix A = [aij ] ∈ Rp×q is given by
a quantum oracle OA :
OA |i|j |0 = |i|j |aij , ∀i ∈ {1, . . . ,p}, ∀j ∈ {1, . . . ,q}.
(10)
In addition, we assume that the vector y = [yi ] ∈ Rn is given
by a quantum oracle Oy :
Oy |i|0 = |i|yi , ∀i ∈ {1, . . . ,n}. (11)
That is, given the indexes i and j , the oracles OA and Oy can
FIG. 2. Loss functions in a SMM. The blue dashed line represents
return the values of A and y, respectively.
the hinge loss
hinge in the original objective function and the red solid Then, via the method outlined in Ref. [22], the following
line the square loss
square used in our method. Here we introduce a quantum state can be generated:
new parameter μ to make
hinge upper bounded by
square strictly.

p

q
|ψA  = aij |i|j , (12)
Theorem 1. There exists a square loss function i=1 j =1


square (ei ) := (1 − ei − μ)2 ,
p

|ψy  = yi |i. (13)
which is a convex surrogate for the hinge loss: i=1


hinge (ei ) := max {0,ei }. Note that our algorithm works with the normalized data,
namely matrices and vectors, as these data in many machine
Using the surrogate loss transforms the objective function learning algorithms are normalized.
(3) to the following problem: Based on this model, our algorithm consists of two
η
n subroutines: a quantum matrix inversion (HHL) algorithm for
1
arg min tr(W W) +
T (k)T W)
(1 − ei − μ)2 − tr( estimating W(k) and b(k) and a QSVT algorithm for estimating
W,b,ei 2 2 i=1
S(k) .
ρ The overall procedure of our algorithm includes the
+ W − S(k) 2F ,
2 following steps.
such that yi [tr(WT Xi ) + b] = 1 − ei , (6) Algorithm. (W,b) = QSMM(,S,X,y).
(1) Initialize k = 1, apply the quantum algorithm
where η is the regularization parameter, which plays the same (b(k) ,α̃ (k) ) = HHL( (k) ,F), where ( (k) ,F) can be prepared in
role as the parameter C in Eq. (1). terms of (X,y, (k) ,
S(k) ) in Eq. (9).
Theorem 2. One of the solutions of the problem (6) is (2) Extract b and α̃ (k) to construct the matrix W(k)
(k)
  according to Eq. (7).
1  n
W =
(k)
 + ρ
(k)
S +
(k) (k)
αi yi Xi , (7) (3) Apply the quantum algorithm S(k) = QSVT(A(k) ,τ ),
ρ+1 i=1 where A(k) is a normalized r-rank approximate of
ρW(k) −  (k) .
and the offset b(k) and α (k) = [αi (k) ] ∈ Rn are the solutions of (4) Update the parameters  (k+1) and 
S(k+1) and set k =
the following linear equation: k + 1.
 (k)    (k)   
b 0 1T b 0 (5) Repeat Steps 1–4 until the number of iterations k =
F (k) ≡ = , (8) max iter; then, one can obtain the regression matrix W and the
α̃ 1 K + I/γ α̃ (k) (k)
offset b.
where K = [Kij ] ∈ Rn×n and (k) = [ i (k) ] ∈ Rn are inde- We begin to introduce the quantum algorithms (b,α̃) =
pendent of α̃ (k) = [αi(k) yi ] ∈ Rn ; specifically, HHL( ,F) and S = QSVT(A,τ ) in detail in the following

subsections.
tr XTi Xj
Kij = ,
ρ+1 1. HHL algorithm
(k)
T
(k) tr[  + ρ S(k) Xi ] To obtain W (k)
in Eq. (7), we should solve the normal-
i = μyi − . (9) (k) T T
ized (b ,α̃ ) = F−1 (0, (k) ) , where F  1. The HHL
(k)
ρ+1

032301-3
BOJIA DUAN, JIABIN YUAN, YING LIU, AND DAN LI PHYSICAL REVIEW A 96, 032301 (2017)

algorithm was proposed for solving matrix inversion problems Remove the register C and measure the register R to be |1.
[7]. But the matrix should be s-sparse in Ref. [7]. In this paper, We have the state proportional to
we adopted a strategy in Ref. [5], therefore the HHL algorithm
can be used when the matrix is nonsparse. The detailed 
n+1

description is shown in Appendix C. The core subroutine of |ψ4  = ui |ψ /λi |ui . (20)
the algorithm is phase estimation (denoted UPE ). Apply UPE i=1
to the registers C and B, which is represented as follows:
(5) Finally, represent the state in the basis of training
UPE = UPE (A) set labels to acquire the desired parameters that exist in the
T −1  coefficients of the state [8]:

  
= (FT ⊗IB ) |τ  τ | ⊗e
C iAτ t0 /T
(H⊗t ⊗IB ), (14) n
1
τ =0 |b,α̃ = √ b|0 + α̃i |i , (21)
l i=1
where register C with t qubits is used to store the estimated
eigenvalues of an Hermite matrix A, register B with s qubits 
with l = b2 + ni=1 α̃i2 .

stores the input state |ψ, FT is the inverse quantum Fourier Here, amplitude estimation can then be used to extract b(k)
T −1
τ =0 |τ  τ | ⊗ e
transform, and C iAτ t0 /T
is the conditional and α̃ (k) , and now we can obtain the matrix W(k) according to
Hamiltonian evolution [7]. Eq. (7).
We now give the HHL algorithm for solving the refor- In summary, the HHL algorithm can be represented as the
mulated SMM to estimate the parameters W(k) and b(k) . In following unitary operation:
Appendix C, we show how to prepare the input |ψ  and

R
e−iFt0 . UHHL = (IR ⊗ UPE ) RRC
f ⊗ I (I ⊗ UPE ),
B
(22)
Input. A quantum state |ψ  and a unitary e−iFt0 .
Output. A quantum state proportional to |b,α̃ ≈ F−1 |ψ . where RRC f is given in Eq. (17).
Algorithm. (b,α̃) = HHL( ,F). Analysis. We now discuss the error of the HHL algorithm.
(1) Prepare three quantum registers: register C for storing Define the condition number of F as κ, and t0 = O(κ/ε).
the estimated eigenvalues |λi  of F, register B for storing Therefore, only the eigenvalues in the interval 1/κ  |λj |  1
the input state |ψ , and register R with an ancilla qubit for are taken into account. The estimation of eigenvalue λ̃j :=
storing the inverse of the eigenvalues λ−1 i of F. Prepare the 2πj /t0 in step (2) introduces error O(1/t0 ) = O(ε/κ) which
three registers in the state translates into error O(ε/(λκ)) in λ−1 in step (3) [7]. With
λ  1/κ taking t0 = O(κ/ε) induces a final error of O(ε).
|ψ0  = |0R (|0|0 · · · |0)C (|ψ )B , (15) We now analyze the time complexity. In phase estimation,
 O(t 2 ) operations and one call to the conditional Hamiltonian
where |ψ  = |0,  = n+1  −1
i=1 φi |i. evolution ( Tτ =0 |τ  τ |C ⊗ eiFτ t0 /T ) are needed, where t is
(2) Perform the unitary operation UPE (F) on the state. the number of qubits in register C [4]. The preparation of the
UPE (F) expands |ψ  into the eigenstates |ui  of F with unitary e−iFt0 costs time O[t02 ε−1 log (npq)] which is shown
corresponding eigenvalues λi and we have the state in Appendix C. As t decides the accuracy of the estimated
eigenvalues, t is not very large to some extent. When the size

n+1
of the training data and feature space is large, the preparation
|ψ1  = |0R ui | ψ |λi C |ui B . (16)
of the unitary e−iFt0 dominates the time complexity of phase
i=1
estimation.
(3) Apply a controlled rotation RRC The probability of obtaining λ−1 determines the number of
f ,
iterations of the algorithm. This probability is determined by
⎛ ⎞R the amplitude square γ /λi , and this value is at least (1/κ 2 )
γ γ2 ⎠ with γ = O(1/κ) and λ  1. Hence, O(κ 2 ) repetitions are
|0 |z → ⎝ |1 +
R C
1 − 2 |0 |zC , (17)
needed to ensure success with high probability. By virtue of the
z z
amplitude amplification algorithm [15], only O(κ) repetitions
are sufficient to achieve a constant success probability of the
to the register R, controlled by the register C, where γ is a
postselection process.
constant. This rotation transforms the state to
In conclusion, the total evolution time of the algo-
 R n+1 rithm is O[t02 ε−1 log (npq)]O(κ) = O[κ 3 ε−3 log (npq)]. In
γ2 γ 
|ψ2  = 1− 2 |0+ |1 ui | ψ |λi C |ui B . (18) contrast, the matrix inversion via classical method needs time
λi λi i=1 O[poly(npq)].

(4) Uncompute the registers C and B to obtain the state 2. QSVT algorithm
 R Herein we propose a QSVT algorithm for estimating
γ2 γ  n+1
|ψ3  = 1 − 2 |0 + |1 (|0 · · · 0)C ui | ψ |ui B . S(k) . Suppose A(k) is a normalized matrix that is a good
λi λi i=1 r-rank approximate of ρW(k) −  (k) , with A(k)   1. In the
(19) following, we simplify the notation A(k) as A. Then, using the

032301-4
QUANTUM ALGORITHM FOR SUPPORT MATRIX MACHINES PHYSICAL REVIEW A 96, 032301 (2017)

Gram-Schmidt decomposition, we have state proportional to



p

q

r

r

|ψA  = aij |i|j  = σk |uk |vk , (23) |ψ3  = (σk − τ )|uk |vk  = |ψS . (29)
i=1 j =1 k=1 k=1

where r is the rank of A, and σk are just the singular values In summary, the QSVT algorithm can be represented as the
of A, with uk and vk being the left and right singular vectors. following unitary operation:
Note that uk are just the eigenvectors of AA† and vk are the †
R
eigenvectors of A† A. The eigenvalues of AA† or A† A are UQSVT = (IR ⊗ UPE ) RRC
f ⊗ I (I ⊗ UPE ),
B
(30)
both λk = σk2 . Based on quantum Random Access Memories
(QRAM), the preparation of |ψA  costs time O[log (pq)]. where RRC f is proposed in Eq. (27).
By taking a partial trace of |ψA  ψA |, we can obtain a mixed Analysis. We now discuss the error of the QSVT algorithm.
state ρAA† [10]: Define the condition number of ρAA† as κ, and t0 = O(κ/ε).
Therefore, only the eigenvalues σk2 in the interval 1/κ  σk2 

p

q 1 are taken into account. The estimation of eigenvalues σ̃k2 :=
ρAA† = trj (|ψA  ψA |) = aij ai∗ j |i i |, (24) 2π k/t0 in step (2) introduces error O(1/t0 ) = O(ε/κ). This
i,i =1 j =1 error translates into error O[ε/(σk κ)] in (1 − τ/σk ) after the
controlled rotation RRC f in step (3). With σk2  1/κ taking
where ρAA† represents the positive Hermitian matrix AA† . √
t0 = O(κ/ε) induces a final error of O( ε/κ).
The QSVT algorithm is now presented as follows:
We now discuss the time complexity of the QSVT algo-
Input. A quantum state |ψA , a unitary e−iρAA† t0 , and a
rithm. Clearly, the preparation of the unitary operation e−iρAA† t0
constant τ .
dominates the time complexity of phase estimation, and it costs
Output. A quantum state proportional to |ψS  =
r time O[t02 ε−1 log (pq)]. The probability of obtaining (σk − τ )
k=1 (σk − τ )|uk |vk . determines the number of iterations of the algorithm, and it is
Algorithm. S = QSV T (A,τ ). determined by the amplitude square γ (σk − τ )/σk . This value
(1) Prepare three quantum registers in the state is at least (1/κ 2 ) with γ = O(1/κ) and σk  1. Hence, O(κ 2 )
repetitions are needed to ensure success with high probability.
|ψ0  = |0R (|0|0 · · · |0)C (|ψA )B . (25)
Using amplitude amplification [15], to achieve a constant
(2) Perform the unitary operation UPE (ρAA† ) on the state. success probability of the postselection process only needs
According to the ideas of quantum principal component O(κ) repetitions. Therefore, the total evolution time of the
analysis in Refs. [5,10], we have the state algorithm is O[t02 ε−1 log (pq)]O(κ) = O[κ 3 ε−3 log (pq)]. In
contrast, the matrix inversion via the classical method needs

r
 C time O[poly(pq)].
|ψ1  = |0R σk σk2 |uk |vk . (26)
k=1
C. Classification
(3) Apply a controlled rotation RRCf to the register R, With the parameters W and b, we can classify a query state
RC
controlled by the register C. Rf is defined as follows: if X via the method presented in Ref. [8]. First, we construct the
σj > τ , training-data state via quantum oracles:
⎛ ⎞
|0R |zC 
1 ⎝
⎛ ⎞R |ψt̃  = √ b|0|0 + |0|λ  + α̃i |i|xi ⎠, (31)
√ √
2 ( z − τ )2 Nt
γ ( z − τ ) γ α̃ y >0
→⎝ |0⎠ |zC ;
i i
√ |1 + 1 −
z z and the query state
(27)  
otherwise do nothing. 1 n
|ψx̃  = √ |0|0 + |i|x , (32)
This rotation transforms the state to Nx i=0
 R
γ (σk − τ ) γ 2 (σk − τ )2 where Nt and Nx are the norms of these states. The inner
|ψ2  = |1 + 1 − |0 product of the two states
σk σk2
⎛ ⎞

r
 C    1 
⊗ σk σk2 |uk |vk  ψt̃  ψx̃ = √ ⎝b + λ | x + α̃i xi | x⎠ (33)
Nt Nx α̃ y >0
k=1 i i


r
 C reveals the label of the classification problem
+ |0R σk σk2 |uk |vk . (28) ⎡⎛ ⎤
⎞T
k=r +1
⎢  ⎥
(4) Uncompute the registers C and B, remove the register yj = sgn⎣⎝λ + αi xi ⎠ x + b⎦.
 (34)
C, and measure the register R to be |1. Then, we have the 
αi yi >0

032301-5
BOJIA DUAN, JIABIN YUAN, YING LIU, AND DAN LI PHYSICAL REVIEW A 96, 032301 (2017)

Then, a swap-test operation can be used to find the value, and ∂L αi


the algorithm achieves an exponential increase of speed over = 0 → ei = − μ + 1, i = 1, . . . ,n,
∂ei γ
its classical counterpart.
∂L
= 0 → yi [tr(WT Xi ) + b] − 1 + ei = 0, i = 1, . . . ,n.
IV. CONCLUSIONS ∂αi
(B2)
We have shown that quantum mechanics can be used to
speed up the learning and classification process of an important Then, with yi2 = 1, substituting the first and third
matrix classifier SMM in machine learning. The algorithmic equations into the fourth in Eqs. (B2) to eliminate W
complexity of the two key subroutines can be O[log (npq)] and  tr(XTi Xj )
and ei , we obtain b + nj=1 αj yj ρ+1 + αγi yi = μyi −
O[log (pq)], respectively. Aiming at the two cores of the SMM T
tr[((k) +ρS(k) ) Xi ]
learning procedure, the least-squares reformulation has been ρ+1
(i = 1, . . . ,n); thus, with the second equation
,
utilized for estimating the parameters (W(k) ,b(k) ); therefore, in Eqs. (B2), the solution can be straightforwardly given by
the HHL algorithm can be used to address the reformulated Eq. (8).
SMM classification model. Furthermore, a QSVT algorithm
is proposed for estimating S(k) . This algorithm can also be APPENDIX C: PREPARATION OF |ψ , e−iFt0 , AND e−iρAA† t0
a subroutine in other quantum machine learning algorithms
for solving the singular value thresholding problem. Finally, Preparation of the input |ψ  of the HHL algorithm. The
we give the classification process, which can be implemented main part of i in Eq. (9) can be reworded as the inner product
quantum mechanically. Our analysis shows that the core of two vectors:
subprocedures can achieve an exponential acceleration over
(k) + ρ
tr[(
T
S(k) ) Xi ]
classical counterparts. We hope that our algorithm can inspire

more fruitful results in quantum machine learning. = vec((k)T + ρ S(k)T )T vec XTi
  
:= λ(k) 
 xi + ρ s |xi ,
(k)
ACKNOWLEDGMENTS (C1)

This work was supported by the National Natural Sci- where {|xi }ni=1 , |s(k) , and |λ(k)
  are the quantum states in the
ence Foundation of China (Grants No. 61571226 and No. basis of the training set space. Here, we have
61701229), the Natural Science Foundation of Jiangsu    (k)    (k)    
Province, China (Grant No. BK20140823), the Jiangsu 1  n
λ xi +ρ s  xi
Innovation Program for Graduate Education (Grant No. |ψ =  |0+ μyi −  |i ,
Nψ i=1
ρ+1
KYLX15_0326), and the Fundamental Research Funds for
the Central Universities. (C2)

APPENDIX A: PROOF OF THEOREM 1 with Nψ being the norms of |ψ . Based on QRAM [11],
a swap test can be used to evaluate these inner products.
Proof. Obviously, for all the ei ,
hinge (ei ) 
square (ei ) when Therefore, the preparation of |ψ  costs time O[log (npq)]
μ  5/4. Then, the square loss is the upper bound of the hinge [12].
loss. Along with the convexity of the square function,
square (ei ) Preparation of the input e−iFt0 of the HHL algorithm. The
meets the surrogate property [27]. unitary e−iFt0 can be simplified to the direct preparation and
exponentiation of the matrix K [8]. To prepare the matrix K
APPENDIX B: PROOF OF THEOREM 2 with elements
Proof. To solve Eq. (6), we construct the Lagrange function

T

tr XTi Xj vec XTi vec XTj xi · xj


as follows: Kij = = = ,
ρ+1 ρ+1 ρ+1
γ 
n
1
L(W,b,e; α) = tr(WT W) + (1 − ei − μ)2 we can take a partial trace of a quantum state
2 2 i=1
1 
ρ n
(k)T W) +
− tr( W − 
S(k) 2F |ψ = √ |i|xi 
2 n i=1

n
− αi {yi [tr(WT Xi )+b]−1+ei }, (B1) to acquire the normalized
i=1
where αi are the Lagrange multipliers. 
n
K = tr2 {|ψ ψ|} := xi |xj |xi ||xj ||i j |. (C3)
Taking partial derivatives of L with respect to W, b, ei , and
i,j =1
αi to be zero, respectively, we have
  Here, |ψ can be prepared in time O[log(npq)].
∂L 1 n
=0→W=  + ρ
 (k)
S +
(k)
αi yi Xi , Then, the unitary e−iFt0 can be simulated via the methods
∂W ρ+1 i=1 outlined in Ref. [8]. Here, t0 is the total evolution time with
 n t0 = tT , i.e., the interval t times the number of steps T in
∂L the phase estimation. For the time slice t, the unitary e−iFt
=0→ αi yi = 0,
∂b i=1 can be simplified to the direct preparation and exponentiation

032301-6
QUANTUM ALGORITHM FOR SUPPORT MATRIX MACHINES PHYSICAL REVIEW A 96, 032301 (2017)

n
of the matrix K with error O(t 2 ): |p q|⊗|q p|.
p,q=1 Thus, using T = O(t02 ε−1 )
−1 copies of K to implement e−iKt0 to accuracy ε is in time
e−iFt = e−itγ I −itJ −itK
e e + O(t 2 ), O[t02 ε−1 log (npq)].
T Preparation of the input e−iρAA† t0 of the QSVT algorithm.
with J = (01 10 ). As K in Eq. (C3) is Hermitian, the The unitary e−iρAA† t0 can also be simulated via O(t02 ε−1 )
exponentiation of the matrix K can be prepared by applying copies of e−iρAA† t for the time slice t: e−iρAA† t (σ ) ≈
the algorithm in Ref. [5] in terms of a density matrix tr1 {e−iSwap t ρAA† ⊗ σ eiSwap t }. As the preparation of |ψA  is
description with error O(t 2 ): e−iKt (σ ) ≈ tr1 {e−iSwap t O[log (pq)], the time complexity of implementing e−i ρAA† t0
K ⊗ σ eiSwap t } = σ − it[K,σ ] + O(t 2 ), where Swap = to accuracy ε is O[t02 ε−1 log (pq)].

[1] J. Adcock, E. Allen, M. Day, S. Frick, J. Hinchliff, M. [15] G. Brassard, P. Hoyer, M. Mosca, and A. Tapp, Contemp. Math.
Johnson, S. Morleyshort, S. Pallister, A. Price, and S. Stanisic, 305, 53 (2002).
arXiv:1512.02900v1. [16] E. Aimeur, G. Brassard, and S. Gambs, Mach. Learn. 90, 261
[2] M. Schuld, I. Sinayskiy, and F. Petruccione, Contemp. Phys. 56, (2013).
172 (2014). [17] S. Lloyd, M. Mohseni, and P. Rebentrost, arXiv:1307.0411v2.
[3] P. Wittek, Quantum Machine Learning What Quantum Comput- [18] N. Wiebe, A. Kapoor, and K. Svore, Quantum Inf. Comput. 15,
ing Means to Data Mining (Elsevier, Singapore, 2014). 0318 (2015).
[4] M. A. Nielsen and I. L. Chuang, in Quantum Computation and [19] I. Kerenidis and A. Prakash, arXiv:1603.08675v3.
Quantum Information: 10th Anniversary Edition (Cambridge [20] H. K. Lau, R. Pooser, G. Siopsis, and C. Weedbrook, Phys. Rev.
University Press, Cambridge, England, 2010), pp. 1–59. Lett. 118, 080501 (2017).
[5] S. Lloyd, M. Mohseni, and P. Rebentrost, Nat. Phys. 10, 108 [21] N. Wiebe, D. Braun, and S. Lloyd, Phys. Rev. Lett. 109, 050505
(2014). (2012).
[6] P. Rebentrost, A. Steffens, and S. Lloyd, arXiv:1607.05404v1. [22] A. Prakash, Ph.D thesis, Dissertations and Theses - Gradworks,
[7] A. W. Harrow, A. Hassidim, and S. Lloyd, Phys. Rev. Lett. 103, 2014.
150502 (2009). [23] I. Cong and L. Duan, New J. Phys. 18, 073011 (2016).
[8] P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev. Lett. 113, [24] C. H. Yu, F. Gao, Q. L. Wang, and Q. Y. Wen, Phys. Rev. A 94,
130503 (2014). 042311 (2016).
[9] Y. Liu and S. Zhang, Theor. Comput. Sci. 657, 38 (2016). [25] L. Luo, Y. Xie, Z. Zhang, and W. J. Li, in Proceedings of the
[10] M. Schuld, I. Sinayskiy, and F. Petruccione, Phys. Rev. A 94, 32nd International Conference on Machine Learning, JMLR
022342 (2016). Workshop and Conference Proceedings 37 (JMLR.org, Lille,
[11] V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. Lett. 100, France, 2015), pp. 938–947.
160501 (2008). [26] J. F. Cai, E. J. Candès, and Z. Shen, SIAM J. Optim. 20, 1956
[12] H. Buhrman, R. Cleve, J. Watrous, and R. de Wolf, Phys. Rev. (2008).
Lett. 87, 167902 (2001). [27] S. Shalev-Shwartz and S. Ben-David, Understanding Machine
[13] L. K. Grover, Phys. Rev. Lett. 79, 325 (1997). Learning: From Theory to Algorithms (Cambridge University
[14] C. Durr and P. Hoyer, arXiv:quant-ph/9607014v2 (1996). Press, Cambridge, England, 2014).

032301-7

You might also like