Professional Documents
Culture Documents
Recent Advances in Large-Scale Linear Classification: Chih-Jen Lin
Recent Advances in Large-Scale Linear Classification: Chih-Jen Lin
Classification
Chih-Jen Lin
Department of Computer Science
National Taiwan University
Introduction
Optimization Methods
Extension of Linear Classification
Discussion and Conclusions
Outline
Introduction
Optimization Methods
Extension of Linear Classification
Discussion and Conclusions
x ⇒ φ(x)
Loss Functions
ξ(w; x, y )
ξL2
ξL1
ξLR
−y wT x
Outline
Introduction
Optimization Methods
Extension of Linear Classification
Discussion and Conclusions
If
ᾱi : old ; αi : new
then
u ← u + (αi − ᾱi )yi xi .
Also costs O(n)
References: first use for SVM probably by Mangasarian
and Musicant (1999); Friess et al. (1998), but
popularized for linear SVM by Hsieh et al. (2008)
Chih-Jen Lin (National Taiwan Univ.) 22 / 42
Optimization Methods
Comparisons
news20 rcv1
yahoo-japan yahoo-korea
Chih-Jen Lin (National Taiwan Univ.) 25 / 42
Optimization Methods
Analysis
Outline
Introduction
Optimization Methods
Extension of Linear Classification
Discussion and Conclusions
LIBSVM LIBLINEAR
RBF Poly Linear Poly
Training time 3h34m53s 3h21m51s 3m36s 3m43s
Parsing speed 0.7x 1x 1652x 103x
UAS 89.92 91.67 89.11 91.71
LAS 88.55 90.60 88.07 90.71
Discussion
Outline
Introduction
Optimization Methods
Extension of Linear Classification
Discussion and Conclusions
Conclusions
References I
B. E. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In
Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages
144–152. ACM Press, 1992.
Y.-W. Chang, C.-J. Hsieh, K.-W. Chang, M. Ringgaard, and C.-J. Lin. Training and testing
low-degree polynomial data mappings via linear SVM. Journal of Machine Learning
Research, 11:1471–1490, 2010. URL
http://www.csie.ntu.edu.tw/~cjlin/papers/lowpoly_journal.pdf.
C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273–297, 1995.
T.-T. Friess, N. Cristianini, and C. Campbell. The kernel adatron algorithm: a fast and simple
learning procedure for support vector machines. In Proceedings of 15th Intl. Conf.
Machine Learning. Morgan Kaufman Publishers, 1998.
C.-J. Hsieh, K.-W. Chang, C.-J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate
descent method for large-scale linear SVM. In Proceedings of the Twenty Fifth
International Conference on Machine Learning (ICML), 2008. URL
http://www.csie.ntu.edu.tw/~cjlin/papers/cddual.pdf.
P. Kar and H. Karnick. Random feature maps for dot product kernels. In Proceedings of the
15th International Conference on Artificial Intelligence and Statistics (AISTATS), pages
583–591, 2012.
References II
C.-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region Newton method for large-scale logistic
regression. Journal of Machine Learning Research, 9:627–650, 2008. URL
http://www.csie.ntu.edu.tw/~cjlin/papers/logistic.pdf.
O. L. Mangasarian. A finite Newton method for classification. Optimization Methods and
Software, 17(5):913–929, 2002.
O. L. Mangasarian and D. R. Musicant. Successive overrelaxation for support vector machines.
IEEE Transactions on Neural Networks, 10(5):1032–1037, 1999.
N. Pham and R. Pagh. Fast and scalable polynomial kernels via explicit feature maps. In
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery
and data mining, pages 239–247, 2013.
T. Steihaug. The conjugate gradient method and trust regions in large scale optimization.
SIAM Journal on Numerical Analysis, 20:626–637, 1983.
T. Yu, D. Wang, M.-C. Yu, C.-J. Lin, and E. Y. Chang. Careful use of machine learning
methods is needed for mobile applications: A case study on transportation-mode
detection. Technical report, Studio Engineering, HTC, 2013. URL
http://www.csie.ntu.edu.tw/~cjlin/papers/transportation-mode/casestudy.pdf.
G.-X. Yuan, C.-H. Ho, and C.-J. Lin. Recent advances of large-scale linear classification.
Proceedings of the IEEE, 100(9):2584–2603, 2012. URL
http://www.csie.ntu.edu.tw/~cjlin/papers/survey-linear.pdf.