Professional Documents
Culture Documents
Al Mamunur Rashid Getting To Know You Learning New
Al Mamunur Rashid Getting To Know You Learning New
kouroshm@alumni.stanford.edu
Abstract
Online and offline targeting and recommendations are major topics in ecommerce. The topic is treated
as “Matrix Completion”, “Missing Values” and “Matrix Imputations” in statistics. The main goal in all
of these fields is to compute the unknown (missing) values in the data matrix. In computing or
recovering the unknown entries of the matrix, overfitting may happen which is due to the lack of
sufficient information and thus some penalization of the objective function in the form of
regularization becomes necessary. This work is based on a different view of regularization, i.e., a
localized regularization technique which leads to improvement in the estimation of the missing values.
Keywords: Feature Reduction, Singular Value Decomposition, Localized Regularization, Lower Dimensional
Space Projection
1 Introduction
The main task of machine learning and data mining related areas has been to seek for the patterns
and insights from any given data set and use those patterns and insights for system analysis and
predictive forecasting.
The introduction of internet has brought a new dimension on the ways businesses sell their
products and interact with their customers. Ubiquity of the web and consequently web applications are
soaring and as a result much of the commerce and customer experience are taking place on line. Many
companies offer their products exclusively or predominantly online. At the same time many present
and potential customers spend much time on line and thus businesses try to use efficient models to
interact with online users and engage them in various desired initiatives. This interaction with online
users is crucial for businesses that hope to see some desired outcome such as purchase, conversions of
any types, simple page views, spending longer time on the business pages and so on.
Selection and peer-review under responsibility of the Scientific Programme Committee of ICCS 2015 2407
c The Authors. Published by Elsevier B.V.
doi:10.1016/j.procs.2015.05.422
Computation of Recommender System using Localized Regularization Kourosh Modarresi
Recommendation system is one of the main tools to achieve these outcomes. The basic idea of
recommender systems is to analyze what is the probability of a desires action by a specific user. Then,
by knowing this probability, one can make decision of what initiatives to be taken to maximize the
desirable outcomes of the online user’s actions. The types of initiatives could include, promotional
initiatives (sending coupons, cash, …) or communication with the customer using all available media
venues such as mail, email, online ad, etc.
The application of recommender systems has significant impact on the success and bottom lines
of many companies. As an example, more than 60% of the movies seen by Netflix customers are
selected using recommendations produced by recommender systems. [ 62]
There are many other direct or indirect metrics influenced by recommender systems. Examples of
these could include an increase of the sale of other products which were not the direct goal of the
recommendations, an increase the chance of customer coming back at the site, increase in brand
awareness and the chance of retargeting the same user at a later time.
An example of recommender system application is Netflix movie rating prediction contest. The
goal was to predict rating for unseen movies from a training set (rating was on a scale of 1 to 5) of
18,000 movies by 400,000 Netflix customers with 99% missing entries in the data matrix.
There are three approaches in the modeling and computation of recommender systems [63]
including “content based” models where the features of the items and item-item similarities are used to
find the unknown values in the data matrix. The second approach is “collaborative filtering” where the
history of user activities and the user-user similarities are the basis of the computation of the missing
entries in the matrix. The third approach is called “hybrid models” where some combinations of the
previous two approaches are used. In this work we use a specific type of collaborative filtering model
(latent factor).
2408
Computation of Recommender System using Localized Regularization Kourosh Modarresi
called features (variables, covariates, predictors, dimensions, attributes, factors, regressors, inputs,
fields, and so on). In fig 1, an example of data matrix (from online commercial data) describes the
conversion or purchases of different users (rows) as a function of different variables (ad campaigns or
columns) :
Only a fraction of the data matrix is shown in here. The unknown entries are shown using question
mark, “?”.
The following model is based on few major assumptions. The first one is that the missing data is
missing completely at random (MCAR). This means that the probability that a data point is missing
does not depend on its observed value. The second assumption is the “low rank” assumption which
implies that the information in the data is concentrated (lies) in a lower dimensional space or the rank
of the data matrix is k which is very small compared to min(m,n). This suggests that SVD (Singular
Value Decomposition) is the solution. The result is referred to as the matrix approximation lemma or
Eckart–Young–Mirsky [36].
Also, with respect to the sparseness of the data matrix, we assume that;
ܽ ݊ ܥଵ.ଶ ݈݊݃ ݎ
Where r= rank(X),
a= number of available entries in X,
for some positive numerical constant C. under these circumstances, we could accurately recover the
missing entries in the data matrix [105 ].
2409
Computation of Recommender System using Localized Regularization Kourosh Modarresi
But, we have missing entries and thus to prevent overfitting, we have to add regularization;
, ଶ
min,ொ σ, (ݔ െ ݎ ݍ כ ் )ଶ + ᆋ( σୀଵ ԡݎ ԡଶ + σୀଵ ฮݍ ฮ )
The implementation of the algorithm is based on stochastic gradient descent (SGD) where we fix all
of the ɉ’s but ɉ . This way, we can optimizeɉ . We repeat this iteratively till all values of ᆋ’s
converge. In the main loop, we initialize R and Q.
4 Results
The model is applied on two data sets. The first data matrix is of 75715×12 dimension containing
the conversion of different ad campaigns for a variety of regions. Fig 1 shows a sample of the data
matrix and fig.2 shows the same portion of data after the application of the model and computation of
the unknown entries. Similarly, fig. 3 and fg.4 show the data matrix in the original and also the
completed data matrix for the example 2. Example 2 is a 2722×122 matrix containing the length of
2410
Computation of Recommender System using Localized Regularization Kourosh Modarresi
time (in seconds) spent by users on different sites. The results show improvement of 21% and 18%
over the similar results without localized regularization.
Figure 3 . The data matrix in example 2 containing the time users spent on different
web sites.
2411
Computation of Recommender System using Localized Regularization Kourosh Modarresi
Figure4. The data matrix in example 2 with computed values for unknown entries.
References
[1] G. Adomavicius and A. Tuzhilin, “Towards the next generation of recommender systems: a
survey of the state-of-the-art and possible extensions,” IEEE Trans. on Data and Knowledge
Engineering 17:6, pp. 734–749, 2005.
[2] C. Anderson, “The Long Tail: Why the Future of Business is Selling Less of More,”
Hyperion Books, New York, 2006.
[3] L. Backstrom, J. Leskovec, “Supervised Random Walks: Predicting and Recommending
Links in Social Networks,” ACM International Conference on Web Search and Data Mining
(WSDM), 2011.
[4] J. Baumeister, “Stable Solution of Inverse Problems”, Vieweg, Braunschweig, Germany,
1987.
[5] S. Becker, J. Bobin, and E. J. Candès. NESTA,” a fast and accurate first-order method for
sparse recovery,” SIAM J. on Imaging Sciences 4(1), 1-39, 2009.
[6] A. Bjorck, “Numerical Methods for Least Squares Problems” ,SIAM, Philadelphia,1996.
[7] S. Boyd and L. Vandenberghe, “Convex Optimization”, Cambridge Uni-
versity Press, 2004.
[8] J.S. Breese, D. Heckerman, and C. Kadie, “Emperical analysis of predictive algorithms for
collaborative filtering,” Proceedings of Fourteenth Conference on Uncertainty in Artificial
Intelligence. Morgan Kaufmann, 1998.
2412
Computation of Recommender System using Localized Regularization Kourosh Modarresi
2413
Computation of Recommender System using Localized Regularization Kourosh Modarresi
[31] H. W. Engl, K. Kunisch, and A. Neubauer, “Convergence rates for Tikhonov regulari- sation
of non-linear ill-posed problems” , Inverse Problems, (5), pp. 523-540, 1998.
[32] H. W. Engl , C. W. Groetsch (Eds), “Inverse and Ill-Posed Problems”, Academic Press,
London, 1987.
[33] M. Fazel, H. Hindi, and S. Boyd. “A rank minimization heuristic with application to
minimum order system approximation”, Proceedings American Control Conference, 6:4734
4739, 2001.
[34] W. Gander, “On the linear least squares problem with a quadratic Constraint”, Technical
report STAN-CS-78-697, Stanford University, 1978.
[35] G. H. Golub, C. F. Van Loan, “Matrix Computations”, 4th Ed., Computer Assisted
Mechanics and Engineering Sciences, Johns Hopkins University Press, US, 2013.
[36] Gene H. Golub, Charles F. Van Loan, “An Analysis of the Total Least Squares Problem”,
Siam J. Numer. Anal., No. 17, pp. 883-893, 1980.
[37] Gene H. Golub, W. Kahan, “Calculating the Singular Values and Pseudo-Inverse of a
Matrix”, SIAM J. Numer. Anal. Ser. B 2, pp. 205-224, 1965.
[38] Gene H. Golub, Michael Heath, Grace Wahba, “Generalized Cross-Validation as a Method
for Choosing a Good Ridge Parameter”, Technometrics 21, pp. 215-223, 1979.
[39] S. Guo, M. Wang, J. Leskovec, “The Role of Social Networks in Online Shopping:
Information Passing, Price of Trust, and Consumer Choice,” ACM Conference on Electronic
Commerce (EC), 2011.
[40] H¨aubl, G., Trifts, V., “Consumer decision making in online shopping environments: The
effects of interactive decision aids,” Marketing Science, 4–21 , 2000.
[41] Hastie, T., Tibshirani, R., and Friedman, J. ,” The Elements of Statistical Learning; Data
mining, Inference and Prediction”, New York: Springer Verlag, 2001.
[42] Hastie, T.J and Tibshirani, R. "Handwritten Digit Recognition via Deformable Prototypes",
AT&T Bell Laboratories Technical Report, 1994.
[43] Hastie, T., Tibshirani, R., Eisen, M., Brown, P., Ross, D., Scherf, U., Weinstein, J., Alizadeh,
A., Staudt, L., and Botstein, D., “ ‘Gene Shaving’ as a Method for Identifying Distinct Sets
of Genes With Similar Expression Patterns,” Genome Biology, 1, 1–21, 2000.
[44] David Heckerman, David Maxwell Chickering, Christopher Meek, Robert Rounthwaite, and
Carl Kadie, “Dependency networks for inference, collaborative filtering, and data
visualization,” Journal of Machine Learning Research, 1:49–75, 2000.
[45] T. Hein, “Some analysis of Tikhonov regularization for the inverse problem of option pricing
in the price-dependent case,” ,SIAM Review, (21)No. 1, pp. 100-111, 1979.
[46] T. Hein and B. Hofmann, “On the nature of ill-posedness of an inverse problem in option
pricing,” ,Inverse Problems,(19), pp. 1319-11338, 2003.
[47] Herlocker, J.L., and Konstan, J.A. “content-Independent Task-Focused Recommendation,”
IEEE Internet Computing, Vol. 5,pp. 40-47, 2001.
[48] Herlocker, J., Konstan, J., Riedl, J., “Explaining collaborative filtering recommendations,”
Proceedings of the 2000 ACM Conference on Computer Supported Cooperative Work, pp.
241–250. ACM, 2000.
[49] B. Hofmann, “Regularization for Applied Inverse and Ill-Posed problems ,” Teubner,
Stuttgart, Germany, 1986.
[50] B. Hofmann, “Regularization of nonlinear problems and the degree of illposedness,” in G.
Anger, R. Goreno, H. Jochmann, H. Moritz, and W. Webers (Eds.), inverse Problems:
principles and Applications in Geophysics,Technology, and Medicine, Akademic Verlag,
Berlin, 1993.
[51] T. A. Hua and R. F. Gunst, “Generalized ridge regression: A note on negative ridge
parameters,” Comm. Statist. Theory Methods, 12, pp. 37-45, 1983.
[52] V.S. Iyengar and T. Zhang, “Empirical study of recommender systems using linear
2414
Computation of Recommender System using Localized Regularization Kourosh Modarresi
2415
Computation of Recommender System using Localized Regularization Kourosh Modarresi
[74] K. Miller, “Least Squares Methods for Ill-Posed Problems with a prescribed bond,” SIAM J.
Math. Anal., No. 1, pp. 52-74, 1970.
[75] B. Moghaddam, Y. Weiss, and S. Avidan, “Spectral bounds for sparse PCA: exact and greedy
algorithms,” Advances in Neural Information Processing Systems, 18, 2006.
[76] V. A. Morozov, “On the solution of functional equations by the method of
regularization”,Sov. Math. Dokl., 7, pp. 414-417, 1966.
[77] V. A. Morozov, “Methods for Solving Incorrectly Posed Problems, “ Springer-Verlag, New
York, 1984.
[78] A. Narayanan, V. Shmatikov, “Robust de-anonymization of large sparse datasets,” IEEE S
ymposium on Security and Privacy, 2008, 111-125.
[79] B. K. Natarajan, “Sparse approximate solutions to linear systems,” SIAM J. Comput.,
24(2):227–234, 1995.
[80] R. Otazo, E. J. Candès and D. Sodickson, “Low-rank and sparse matrix decomposition for
accelerated dynamic MRI with separation of background and dynamic components,” To
appear in Magnetic Resonance in Medicine, 2013.
[81] R. L. Parker , “Understanding inverse theory,” Ann. Rev. Earth Planet. Sci., No. 5, pp. 35-64,
1977.
[82] T. Raus, “The principle of the residual in the solution of ill-posed problems with
nonselfadjoint operator,” Uchen. Zap. Tartu Gos. Univ., 75, pp. 12-20, 1985.
[83] T. Reginska, “A Regularization Parameter in Discrete Ill-Posed Problems,” SIAM J. Sci.
Comput., No. 17, pp. 740-749, 1996.
[84] F. Ricci, L. Rokach, B. Shapira. P.B. Kantor (Eds.), “Recommender Systems Handbook,”
Springer, New York, NY, USA, 2011.
[85] E. Sadikov, M. Medina, J. Leskovec, H. Garcia-Molina.,“Correcting for Missing Data in
Information Cascades,” ACM International Conference on Web Search and Data Mining
(WSDM), 2011.
[86] Sarwar, B. "Sparsity, scalability, and distribution in recommender systems, ", PhD thesis,
University of Minnesota, 2001.
[87] Sarwar, B., Karypis, G., Konstan, J. A., and Riedl, J. "Application of dimensionality
reduction in recommender system—A case study", Proceedings of the ACM WebKDD-2000
Workshop, 2000.
[88] Sarwar, B., Karypis, G., Konstan, J. A., and Riedl, J. "Analysis of recommendation
algorithms for e-commerce", Proceedings of the ACME-Commerce, pp.158–167, 2000.
[89] Shardanand, U., Maes, P. “Social Information Filtering: Algorithms for Automating ‘Word of
Mouth,” Conf. Human Factors in Computing Systems, 1995.
[90] Sinha,R., Swearingen,K., “The role of transparency in recommender systems,” CHI 2002
Extended Abstracts on Human Factors in Computing Systems, pp. 830–831, ACM , 2002.
[91] A. Tarantola and B. Valette , “Generalized nonlinear inverse problems solved using the least
squares criterion,” Reviews of Geophysics and Space Physics, No. 20, pp. 219-232 , 1993.
[92] A. Tarantola, “Inverse Problem Theory, Elsevir, Amsterdam ,” 1987.
[93] Tibshirani, R., “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal
Statistical Society, Series B, 58, 267–288, 1996.
[94] R. Tibshirani, “Regression shrinkage and selection via the LASSO,” Journal of the Royal
statistical society, series B, 58(1):267–288, 1996.
[96] A. N. Tikhonov, “Regularization of incorrectly posed problems,” Dokl. Akad. Nauk. SSSSR,
153, (1963), pp. 49-52= Soviet Math. Dokl., 4, 1963.
2007.
[105] R. Witten and E. J. Candès, “ Randomized algorithms for low-rank matrix factorizations:
sharp performance bounds,” To appear in Algorithmica, 2013.
2416