Rebentrost2014 PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

week ending

PRL 113, 130503 (2014) PHYSICAL REVIEW LETTERS 26 SEPTEMBER 2014

Quantum Support Vector Machine for Big Data Classification


Patrick Rebentrost,1,* Masoud Mohseni,2 and Seth Lloyd1,3,†
1
Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
2
Google Research, Venice, California 90291, USA
3
Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
(Received 12 February 2014; published 25 September 2014)
Supervised machine learning is the classification of new data based on already classified training
examples. In this work, we show that the support vector machine, an optimized binary classifier, can be
implemented on a quantum computer, with complexity logarithmic in the size of the vectors and the number
of training examples. In cases where classical sampling algorithms require polynomial time, an exponential
speedup is obtained. At the core of this quantum big data algorithm is a nonsparse matrix exponentiation
technique for efficiently performing a matrix inversion of the training data inner-product (kernel) matrix.

DOI: 10.1103/PhysRevLett.113.130503 PACS numbers: 03.67.Ac, 07.05.Mh

Machine learning algorithms can be categorized along a [19–21]. The error dependence in the training stage is
spectrum of supervised and unsupervised learning [1–4]. In O(polyðϵ−1 −1
K ; ϵ Þ), where ϵK is the smallest eigenvalue
strictly unsupervised learning, the task is to find structure considered and ϵ is the accuracy. In cases when a low-rank
such as clusters in unlabeled data. Supervised learning approximation is appropriate, our quantum SVM operates on
involves a training set of already classified data, from the full training set in logarithmic run time.
which inferences are made to classify new data. In both Support vector machine.—The task for the SVM is to
cases, recent “big data” applications exhibit a growing classify a vector into one of two classes, given M training
number of features and input data. A Support Vector data points of the form fð~xj ;yj Þ∶ x~j ∈ RN ;yj ¼ 1gj¼1…M ,
Machine (SVM) is a supervised machine learning algo- where yj ¼ 1 or −1, depending on the class to which x~j
rithm that classifies vectors in a feature space into one of belongs. For the classification, the SVM finds a maximum-
two sets, given training data from the sets [5]. It operates margin hyperplane with normal vector w ~ that divides the
by constructing the optimal hyperplane dividing the two two classes. The margin is given by two parallel hyper-
sets, either in the original feature space or in a higher- planes that are separated by the maximum possible distance
dimensional kernel space. The SVM can be formulated as a 2=jwj,
~ with no data points inside the margin. Formally,
quadratic programming problem [6], which can be solved these hyperplanes are constructed so that w ~ · x~j þ b ≥ 1 for
in time proportional to O( logðϵ−1 ÞpolyðN; MÞ), with N the x~j in the þ1 class and that w ~ · x~j þ b ≤ −1 for x~j in the −1
dimension of the feature space, M the number of training class, where b=jwj ~ is the offset of the hyperplane. Thus, in
vectors, and ϵ the accuracy. In a quantum setting, binary the primal formulation, finding the optimal hyperplane
classification was discussed in terms of Grover’s search in consists of minimizing jwj ~ 2 =2 subject to the inequality
[7] and using the adiabatic algorithm in [8–11]. Quantum
constraints yj ðw~ · x~j þ bÞ ≥ 1 for all j. The dual formu-
learning was also discussed in [12,13].
lation [6] is maximizing over the Karush-Kuhn-Tucker
In this Letter, we show that a quantum support vector
multipliers α~ ¼ ðα1 ; …; αM ÞT the function
machine can be implemented with Oðlog NMÞ run time in
both training and classification stages. The performance in N X
M
1X M
arises due to a fast quantum evaluation of inner products, Lð~αÞ ¼ yj αj − αK α; ð1Þ
j¼1
2 j;k¼1 j jk k
discussed in a general machine learning context by us in
[14]. For the performance in M, we reexpress the SVM as an P
subject to the constraints M j¼1 αj ¼ 0 and yj αjP≥ 0. The
approximate least-squares problem [15] that allows for a ~¼ M
hyperplane parameters are recovered from w j¼1 αj x
~j
quantum solution with the matrix inversion algorithm
[16,17]. We employ a technique for the exponentiation of and b ¼ yj − w ~ · x~j (for those j where αj ≠ 0). Only a few
nonsparse matrices recently developed in [18]. This allows of the αj are nonzero: these are the ones corresponding to
us to reveal efficiently in quantum form the largest eigen- the x~j that lie on the two hyperplanes—the support vectors.
values and corresponding eigenvectors of the training data We have introduced the kernel matrix, a central quantity in
overlap (kernel) and covariance matrices. We thus effi- many machine learning problems [19,21], K jk ¼kð~xj ;~xk Þ ¼
ciently perform a low-rank approximation of these matrices x~j ·~xk , defining the kernel function kðx; x0 Þ. More compli-
[Principal Component Analysis (PCA)]. PCA is a common cated nonlinear kernels and soft margins will be studied
task arising here and in other machine learning algorithms below. Solving the dual form involves evaluating the

0031-9007=14=113(13)=130503(5) 130503-1 © 2014 American Physical Society


week ending
PRL 113, 130503 (2014) PHYSICAL REVIEW LETTERS 26 SEPTEMBER 2014
P
MðM − 1Þ=2 dot products x~j · x~k in the kernel matrix and Here, S ¼ M m;n¼1 jmihnj ⊗ jnihmj is the swap matrix of
then finding the optimal αj values by quadratic pro- dimension M × M 2 . Equation (3) is the operation that is
2

gramming, which takes OðM3 Þ in the nonsparse case implemented on the quantum computer performing the
[22]. As each dot product takes time OðNÞ to evaluate, machine learning. For the time slice Δt, it consists of the
the classical support vector algorithm takes time preparation of an environment state K̂ (see above) and
O( logð1=ϵÞ × M 2 ðN þ MÞ) with accuracy ϵ. The result the application of the global swap operator to the combined
is a binary classifier for new data x~: system and environment state followed by discarding the
X M  environmental degrees of freedom. Equation (3) shows that
yð~xÞ ¼ sgn αj kð~xj ; x~Þ þ b : ð2Þ enacting e−iK̂Δt is possible with error OðΔt2 Þ. The efficient
j¼1 preparation and exponentiation of the training data kernel
matrix, which appears in many machine learning problems
Quantum machine learning with the kernel matrix.—In
[19,21], potentially enables a wide range of quantum
the quantum setting, assume that oracles for Pthe training machine learning algorithms. We now discuss a complete
data that return quantum vectors j~xj i¼1=j~xj j Nk¼1 ð~xj Þk jki,
quantum big data algorithm.
the norms j~xj j, and the labels yj are given. The quantum Quantum least-squares support vector machine.—A key
machine learning performance is relative to these oracles idea of this work is to employ the least-squares reformu-
and can be considered a lower bound for the true complex- lation of the support vector machine developed in [15] that
ity [23]. One way of efficiently constructing these states is circumvents the quadratic programming and obtains the
via quantum RAM, which uses OðMNÞ hardware resources parameters from the solution of a linear equation system.
but only Oðlog MNÞ operations to access them; see [14,24]. The central simplification is to introduce slack variables ej
Using the inner product evaluation of [14] to prepare the (soft margins) and replace the inequality constraints with
kernel matrix, we can achieve a run time for the SVM of equality constraints (using y2j ¼ 1):
O(logð1=ϵÞM3 þM 2 logN=ϵ). Classically, the inner product
evaluation is O(ϵ−2 polyðNÞ) by sampling when the compo- yj ðw
~ · x~j þ bÞ ≥ 1 → ðw
~ · x~j þ bÞ ¼ yj − yj ej : ð4Þ
nents of the x~j are distributed unevenly, for example, when a
Fourier transform is part of the postprocessing step [23]. In addition to the constraints, the implied P Lagrangian
2
The kernel matrix plays a crucial role in the dual for- function contains a penalty term γ=2 M j¼1 ej , where
mulation Eq. (1) and the least-squares reformulation dis- user-specified γ determines the relative weight of the
cussed in the next section. At this point we can discuss an training error and the SVM objective. Taking partial
efficient quantum method for direct preparation and expo- derivatives of the Lagrangian function and eliminating
nentiation of the normalized kernel matrix K̂ ¼ K=trK. For the variables u~ and ej leads to a least-squares approxima-
the preparation,
pffiffiffiffiffi P first call the training data oracles with the tion of the problem:
state 1= M M jii. This prepares in quantum parallel the
p ffiffiffiffiffiffi PM
i¼1 P       
state jχi ¼ 1= N χ i¼1 j~xi jjiij~xi i, with N χ ¼ M xi j2 ,
i¼1 j~
T
0
≡ 0 1~
b b
F ¼ : ð5Þ
in Oðlog NMÞ run time. If we discard the training set α~ 1~ K þ γ −1 1 α~ y~
register, we obtain the desired kernel matrix as a quantum
density matrix. This P can be seen from the partial trace Here, K ij ¼ x~Ti · x~j is again the symmetric kernel matrix,
tr2 fjχihχjg ¼ 1=N χ M i;j¼1 h~
xj j~xi ij~xi jj~xj jjiihjj ¼ K=trK. See
y~ ¼ ðy1 ; …; yM ÞT , and 1~ ¼ ð1; …; 1ÞT . The matrix F is
[25] for an independent estimation of the trace of K. ðM þ 1Þ × ðM þ 1Þ dimensional. The additional row and
For quantum mechanically computing a matrix inverse
column with the 1~ arise because of a nonzero offset b. The
such as K̂ −1 , one needs to be able to enact e−iK̂Δt efficiently. αj take on the role as distances from the optimal margin and
However, the kernel matrix K̂ is not sparse for the appli- usually are not sparse. The SVM parameters are determined
cation of the techniques in [27,28]. For the exponentiation of
schematically by ðb;~αT ÞT ¼F−1 ð0;~yT ÞT. As with the quad-
nonsparse symmetric or Hermitian matrices, a strategy was
ratic programming formulation, the classical complexity of
developed by us in [18]. We adapt it to the present problem.
the least-squares support vector machine is OðM 3 Þ [22].
Adopting a density matrix description to extend the space
For the quantum support vector machine, the task is to
of possible transformations gives, for some quantum state
generate a quantum state jb; α~ i describing the hyperplane
ρ, e−iK̂Δt ρeiK̂Δt ¼ e−iLK̂ Δt ðρÞ. The superoperator notation with the matrix inversion algorithm [16] and then classify a
LK ðρÞ ¼ ½K; ρ was defined. Applying the algorithm of [18]
state j~xi. We solve the normalized F̂jb; α~ i ¼ j~yi, where F̂ ¼
obtains
F=trF with jjFjj ≤ 1. The classifier will be determined by
e−iLK̂ Δt ðρÞ ≈ tr1 fe−iSΔt K̂ ⊗ ρeiSΔt g the success probability of a swap test between jb; α~ i and j~xi.
For application of the quantum matrix inversion algorithm,
¼ ρ − iΔt½K̂; ρ þ OðΔt2 Þ: ð3Þ F̂ needs to be exponentiated efficiently. The matrix F̂ is

130503-2
week ending
PRL 113, 130503 (2014) PHYSICAL REVIEW LETTERS 26 SEPTEMBER 2014

schematically separated as F̂ ¼ ðJ þ K þ γ −1 1Þ=trF, with Kernel matrix approximation and error analysis.—We
 T  now show that quantum matrix inversion essentially per-
0 1~
J¼ ~ , and the Lie product formula allows for forms a kernel matrix principal component analysis and
1 0
e−iF̂Δt ¼ e−iΔt1=trF e−iJΔt=trF e−iKΔt=trF þ OðΔt2 Þ. The matrix give a run time and error analysis of the quantum algorithm.
J is straightforward [27] (a “star” The matrix under consideration, F̂ ¼ F=trF, contains the
pffiffiffiffigraph).
ffi The two nonzero
kernel matrix K̂ γ ¼ K γ =trK γ and an additional row and
eigenvalues of J are λ ¼  M and the corresponding
star
pffiffiffi pffiffiffiffiffi PM column due to the offset parameter b. In case the offset is
eigenstates are jλstar i ¼ 1= 2ðj0i  ð1= M Þ k¼1 jkiÞ. negligible, the problem reduces to matrix inversion of the
−1
The matrix γ 1 is trivial. For K=trK, proceed according kernel matrix K̂ γ only. For any finite γ, K̂ γ is positive
to Eq. (3) by rescaling time by a factor trK=trF ¼ Oð1Þ definite, and thus invertible. The positive eigenvalues of F̂
appropriately. This e−iF̂Δt is employed conditionally in are dominated by the eigenvalues of K̂ γ . In addition, F̂ has
phase estimation. one additional negative eigenvalue which is involved in
The right-hand side j~yi can be formally expanded into determining the offset parameter b. The maximum absolute
eigenstates juj i of F̂ with corresponding eigenvalues eigenvalue of F̂ is no greater than 1 and the minimum
P absolute eigenvalue is ≤ Oð1=MÞ. The minimum eigen-
λj , j~yi ¼ Mþ1
j¼1 huj j~
yijuj i. With a register for storing an
approximation of the eigenvalues (initialized to j0i), phase value arises, e.g., from the possibility of having a training
estimation generates a state which is close to the ideal state example that has (almost) zero overlap with the other
storing the respective eigenvalue: training examples. Because of the normalization the eigen-
value will be Oð1=MÞ and, as a result, the condition
X
M þ1 X
M þ1
huj j~yi number κ (the largest eigenvalue divided by the smallest
j~yij0i → huj j~yijuj ijλj i → juj i: ð6Þ eigenvalue) is OðMÞ in this case. To resolve such eigen-
j¼1 j¼1
λj values would require exponential run time [16]. We define a
constant ϵK such that only the eigenvalues in the interval
The second step inverts the eigenvalue and is obtained as in ϵK ≤ jλj j ≤ 1 are taken into account, essentially defining an
[16] by performing a controlled rotation and uncomputing effective condition number κ eff ¼ 1=ϵK. Then, the filtering
the eigenvalue register. In the basis of training set labels, procedure described in [16] is employed in the phase
the expansion coefficients of the new state are the
P desired estimation using this κeff . An ancilla register is attached to
support vector machine parameters: (C ¼ b2 þ M 2
k¼1 αk ), the quantum state and appropriately defined filtering
  functions discard eigenvalues below ϵK when multiplying
1 XM
the inverse 1=λj for each eigenstate in Eq. (6). The desired
jb; α~ i ¼ pffiffiffiffi bj0i þ αk jki : ð7Þ
C outcome is obtained by postselecting the ancilla register.
k¼1
The legitimacy of this eigenvalue filtering can be
Classification.—We have now trained the quantum SVM rationalized by a PCA argument. Define the N × M
and would like to classify a query state j~xi. From the state (standardized) data matrix X ¼ ð~x1 ; …; x~M Þ. The M × M
jb; α~ i in Eq. (7), construct by calling the training data oracle kernel matrix is given by K ¼ XT P X. The N × N covariance
matrix is given by Σ ¼ XXT ¼ M ~m x~Tm. Often, data
m¼1 x
 X 
1 M
sets are effectively described by a few unknown factors
~ ¼ pffiffiffiffiffiffi bj0ij0i þ
jui αk j~xk jjkij~xk i ; ð8Þ (principal components) which admit a low-rank approxi-
N u~ k¼1
mation for Σ=trΣ with large Oð1Þ eigenvalues. The matrices
PM K and Σ have the same nonzero eigenvalues. Keeping the
with N u~ ¼ b2 þ k¼1 α2k j~xk j2 . In addition, construct the
query state large eigenvalues and corresponding eigenvectors of the
kernel matrix by setting the cutoff ϵK ¼ Oð1Þ thus retains
 X  the principal components of the covariance matrix, i.e., the
1 M
j~xi ¼ pffiffiffiffiffiffi j0ij0i þ j~xjjkij~xi ; ð9Þ most important features of the data. See Supplemental
N x~ k¼1 Material for further discussion of the low-rank approxi-
mation [25]. Oð1Þ eigenvalues of K=trK exist, for example,
with N x~ ¼ Mj~xj2 þ 1. For the classification, we perform a in the case of well-separated clusters with OðMÞ vectors
pffiffiffi test. Using an ancilla, construct the state jψi ¼
swap in them. A simple artificial example is K ¼ 12×2 ⊗
1= 2ðj0ij ~ffiffiffi þ j1ij~xiÞ and measure the ancilla in the state
pui
T
ð1~1~ ÞM=2×M=2 . The principal components of the kernel
jϕi ¼ 1= 2ðj0i − j1iÞ. The measurement has the success matrix are found in quantum parallel by phase estimation
probability P ¼ jhψjϕij2 ¼ 12 ð1 − huj~ ~ xiÞ. The inner product
pffiffiffiffiffiffiffiffiffiffiffiffi P and the filtering procedure.
is given by huj~ ~ xi ¼ 1= N x~ N u~ ðb þ M k¼1 αk j~
xk jj~xjh~xk j~xiÞ, We continue with a discussion of the run time of the
which is Oð1Þ in the usual case when the α are not sparse. P quantum algorithm. The interval Δt can be written as
can be obtained to accuracy ϵ by iterating O(Pð1 − PÞ=ϵ2 ) Δt ¼ t0 =T, where T is the number of time steps in the
times. If P < 1=2 we classify j~xi as þ1; otherwise, −1. phase estimation and t0 is the total evolution time

130503-3
week ending
PRL 113, 130503 (2014) PHYSICAL REVIEW LETTERS 26 SEPTEMBER 2014

determining the error of the phase estimation [16]. The Conclusion.—In this work, we have shown that an
swap matrix used in Eq. (3) is 1-sparse and e−iSΔt is important classifier in machine learning, the support vector
efficiently simulable in negligible time O(~ logðMÞΔt) [28]. machine, can be implemented quantum mechanically with
The O ~ notation suppresses more slowly growing factors, algorithmic complexity logarithmic in feature size and the
such as a log M factor [16,28]. For the phase estimation, number of training data, thus providing one example of a
quantum big data algorithm. A least-squares formulation of
the propagator e−iLF̂ Δt is enacted with error OðΔt2 jjF̂jj2 Þ;
the support vector machine allows the use of phase
see Eq. (3). With the spectral norm for a matrix A,
estimation and the quantum matrix inversion algorithm.
jjAjj ¼ maxj~vj¼1 jA~vj, we have jjF̂jj ¼ Oð1Þ. Taking powers The speed of the quantum algorithm is maximized when the
of this propagator, e−iLF̂ τΔt for τ ¼ 0; …; T − 1 leads to an training data kernel matrix is dominated by a relatively
error of maximally ϵ ¼ OðTΔt2 Þ ¼ Oðt20 =TÞ. Thus, the run small number of principal components. We note that there
time is T ¼ Oðt20 =ϵÞ. Taking into account the preparation of exist several heuristic sampling algorithms for the SVM
the kernel matrix in Oðlog MNÞ, the run time is thus [29] and, more generally, for finding eigenvalues or vectors
Oðt20 ϵ−1 log MNÞ. The relative error of λ−1 by phase of low-rank matrices [30,31]. Information-theoretic argu-
estimation is given by O(1=ðt0 λÞ) ≤ O(1=ðt0 ϵK Þ) for λ ≥ ments show that classically finding a low-rank matrix
ϵK [16]. If t0 is taken Oðκeff =ϵÞ ¼ O(1=ðϵK ϵÞ), this error is approximation is lower bounded by ΩðMÞ in the absence
OðϵÞ. The run time is thus Oðϵ ~ −2 −3
K ϵ log MNÞ. Repeating
of prior knowledge [32], suggesting a similar lower bound
the algorithm for Oðκ eff Þ times to achieve a constant for the least-squares SVM. Aside from the speedup, another
success probability of the postselection step obtains a final timely benefit of quantum machine learning is data privacy
run time of Oðκ3eff ϵ−3 log MNÞ. To summarize, we find a [14]. The quantum algorithm never requires the explicit
OðMNÞ representation of all the features of each of the
quantum support vector machine that scales as Oðlog MNÞ,
training examples, but it generates the necessary data
which implies a quantum advantage in situations where
structure, the kernel matrix of inner products, in quantum
classically OðMÞ training examples and OðNÞ samples for
parallel. Once the kernel matrix is generated, the individual
the inner product are required.
features of the training data are fully hidden from the user.
Nonlinear support vector machines.—One of the most
Recently, quantum machine learning was discussed in
powerful uses of support vector machines is to perform
[33,34]. In summary, the quantum support vector machine
nonlinear classification [5]. Perform a nonlinear mapping
is an efficient implementation of an important machine
ϕð~
~ xj Þ into a higher-dimensional vector space. Thus, the
learning algorithm. It also provides advantages in terms of
kernel function becomes a nonlinear function in x~: data privacy and could be used as a component in a larger
quantum neural network.
kð~xj ; x~k Þ ¼ ϕð~
~ xj Þ · ϕð~
~ xk Þ: ð10Þ This work was supported by DARPA, NSF, ENI, AFOSR,
the Google-NASA Quantum Artificial Intelligence
Laboratory, and Jeffrey Epstein. The authors acknowledge
For example, kð~xj ; x~k Þ ¼ ð~xj · x~k Þd . Now perform the SVM
helpful discussions with Scott Aaronson and Nan Ding.
classification in the higher-dimensional space. The sepa-
rating hyperplanes in the higher-dimensional space now
correspond to separating nonlinear surfaces in the origi-
*
nal space. rebentr@mit.edu

The ability of quantum computers to manipulate high- slloyd@mit.edu
dimensional vectors affords a natural quantum algorithm [1] D. J. C. MacKay, Information Theory, Inference and Learn-
for polynomial kernel machines. Simply map each vector ing Algorithms (Cambridge University Press, Cambridge,
England, 2003).
j~xj i into the d-times tensor product jϕð~xj Þi ≡ j~xj i ⊗    ⊗
[2] E. Alpaydin, Introduction to Machine Learning (Adaptive
j~xj i and use the feature that hϕð~xj Þjϕð~xk Þi ¼ h~xj j~xk id . Computation and Machine Learning) (MIT Press,
Arbitrary polynomial kernels can be constructed using this Cambridge, MA, 2004).
trick. The optimization using a nonlinear, polynomial [3] C. M. Bishop, Pattern Recognition and Machine Learning
kernel in the original space now becomes a linear hyper- (Springer, New York, 2007).
plane optimization in the d-times tensor product space. [4] K. P. Murphy, Machine Learning: A Probabilistic Perspec-
Considering only the complexity in the vector space dimen- tive (MIT Press, Cambridge, MA, 2012).
[5] C. Cortes and V. Vapnik, Mach. Learn. 20, 273 (1995).
sion, the nonlinear d-level polynomial quantum kernel
[6] S. Boyd and L. Vandenberghe, Convex Optimization
algorithm to accuracy ϵ then runs in time Oðd log N=ϵÞ. (Cambridge University Press, Cambridge, England, 2004).
Note that, in contrast to classical kernel machines, the [7] D. Anguita, S. Ridella, F. Rivieccion, and R. Zunino, Neural
exponential quantum advantage in evaluating inner products Netw. 16, 763 (2003).
allows quantum kernel machines to perform the kernel [8] H. Neven, V. S. Denchev, G. Rose, and W. G. Macready,
evaluation directly in the higher-dimensional space. arXiv:0811.0416.

130503-4
week ending
PRL 113, 130503 (2014) PHYSICAL REVIEW LETTERS 26 SEPTEMBER 2014

[9] H. Neven, V. S. Denchev, G. Rose, and W. G. Macready, [23] S. Aaronson, in Proceedings of the 42nd ACM Symposium
arXiv:0912.0779. on Theory of Computing, Cambridge, MA, 2010 (ACM,
[10] K. L. Pudenz and D. A. Lidar, Quantum Inf. Process. 12, New York, 2010).
2027 (2013). [24] V. Giovannetti, S. Lloyd, and L. Maccone, Phys. Rev. Lett.
[11] V. S. Denchev, N. Ding, S. V. N. Vishwanathan, and H. 100, 160501 (2008).
Neven, in Proceedings of the 29th International Conference [25] See Supplemental Material http://link.aps.org/
on Machine Learning, Edinburgh, Scotland, UK, 2012 supplemental/10.1103/PhysRevLett.113.130503, which
(Omnipress, Wisconsin, 2012). includes Ref. [26], for the estimation of the trace of the
[12] M. Sasaki and A. Carlini, Phys. Rev. A 66, 022303 (2002). kernel matrix and details on the kernel matrix low-rank
[13] R. A. Servedio and S. J. Gortler, SIAM J. Comput. 33, 1067 approximation.
(2004). [26] C. Ding and X. He, in Proceedings of the 21st International
[14] S. Lloyd, M. Mohseni, and P. Rebentrost, arXiv:1307.0411. Conference on Machine Learning, Banff, AB, Canada, 2004
[15] J. A. K. Suykens and J. Vandewalle, Neural Processing (ACM, New York, 2004).
Letters 9, 293 (1999). [27] A. Childs, Commun. Math. Phys. 294, 581 (2010).
[16] A. W. Harrow, A. Hassidim, and S. Lloyd, Phys. Rev. Lett. [28] D. W. Berry, G. Ahokas, R. Cleve, and B. C. Sanders,
103, 150502 (2009). Commun. Math. Phys. 270, 359 (2007).
[17] N. Wiebe, D. Braun, and S. Lloyd, Phys. Rev. Lett. 109, [29] S. Shalev-Shwartz, Y. Singer, and N. Srebro, in Proceedings
050505 (2012). of the 24th International Conference on Machine Learning,
[18] S. Lloyd, M. Mohseni, and P. Rebentrost, Nat. Phys. 10, 631 Corvallis, OR, 2007 (ACM, New York, 2007).
(2014) [30] E. Liberty, F. Woolfe, P.-G. Martinsson, V. Rokhlin,
[19] K.-R. Müller, S. Mika, G. Rätsch, K. Tsuda, and B. and M. Tygert, Proc. Natl. Acad. Sci. U.S.A. 104, 20167
Schölkopf, IEEE Trans. Neural Networks 12, 181 (2001). (2007).
[20] L. Hoegaerts, J. A. K. Suykens, J. Vandewalle, and [31] P. Drineas, R. Kannan, and M. W. Mahoney, SIAM J.
B. D. Moor, in Proceedings of IEEE International Joint Comput. 36, 158 (2006).
Conference on Neural Networks, Budapest, Hungary, 2004 [32] Z. Bar-Yossef, in Proceedings of the 35th Annual ACM
(IEEE, New York, 2004). Symposium on Theory of Computing, San Diego, 2003
[21] T. Hofmann, B. Schölkopf, and A. J. Smola, Ann. Stat. 36, (ACM, New York, 2003).
1171 (2008). [33] N. Wiebe, A. Kapoor, and K. Svore, arXiv:1401.2142.
[22] The exponent 3 can be improved considerably; see D. [34] G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin-
Coppersmith and S. Winograd, J. Symb. Comput. 9, 251 Delgado, and H. J. Briegel, Phys. Rev. X 4, 031002
(1990). (2014).

130503-5

You might also like