Professional Documents
Culture Documents
User-to-User Recommendation Using The Concept of Movement Patterns: A Study Using A Dating Social Network
User-to-User Recommendation Using The Concept of Movement Patterns: A Study Using A Dating Social Network
User-to-User Recommendation Using The Concept of Movement Patterns: A Study Using A Dating Social Network
net/publication/320623215
CITATIONS READS
0 347
3 authors:
Alexei Lisitsa
University of Liverpool
107 PUBLICATIONS 565 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Learning and Evolving Knowledge from User Generated Social Media Data View project
All content following this page was uploaded by Mohammed al-zeyadi on 26 October 2017.
Abstract: Dating Social Networks (DSN) have become a popular platform for people to look for potential romantic
partners. However, the main challenge is the size of the dating network in terms of the number of registered
users, which makes it impossible for users to conduct extensive searches. DSN systems thus make recommen-
dations, typically based on user profiles, preferences and behaviours. The provision of effective User-to-User
recommendation systems have thus become an essential part of successful dating networks. To date the most
commonly used recommendation technique is founded on the concept of collaborative filtering. In this paper
an alternative approach, founded on the concept of Movement Patterns, is presented. A movement pattern
is a three-part pattern that captures the “traffic” (messaging) between vertices (users) in a DSN. The idea is
that these capture the behaviour of users within a DSN while at the same time capturing the associated profile
and preference data. The idea has been built into a User-to-User recommender system, the RecoMP system.
The system has been evaluated, by comparing its operation with a collaborative filtering systems (the RecoCF
system), using a data set from the Chinese Jiayuan.com DSN comprising 548, 395 vertices. The reported
evaluation demonstrates that very successful results can be produced, a best average F-score value of 0.961.
An overview of the proposed MP based DSN rec- 3.2 Movement pattern formalism
ommendation systems is presented in this section.
The section commences, Sub-section 3.1, with a re- From the foregoing we are interested in building a rec-
view of the basic operation of DSN systems. A for- ommender system for a DSN system founded on the
concept of MPs. In the introduction to this paper it ported on later in this paper, 25 different attribute val-
was noted that a MP is a three-part pattern. More for- ues were used to describe users profiles. Each edge
mally a MP comprises a tuple of the form: ei ∈ E is then defined by a another set of attribute val-
hF, E, T i (1) ues describing the nature of the communication. For
the evaluation considered later in this paper only two
where F, E and T are sets of attribute values. More edge attribute was considered, “communication type”
specifically the attribute value set F represents a and “number of messages sent”, the first had two po-
“From” (sender) vertex, T a “To” (receiver) vertex, tential values: Reciprocal and Non reciprocal. The
and E an “Edge” connecting the two vertices describ- second had a range of values.
ing the nature of the traffic (details of movement) be-
tween them. We refer to a tuple of this type using the 4 Recommendation System Based on
acronym FET. The minimum number of attribute val-
ues in each part (set) must be at least one. The maxi- Movement Patterns (RecoMP)
mum number of values depends on the size of the at- In this section the proposed MP based DSN rec-
tribute sets to which F, E and T subscribe, an MP can ommendation algorithm is presented, the RecoMP al-
only feature a maximum of one value per subscribed gorithm. Recall that the idea is to use knowledge
attribute. The attribute set to which F and T subscribe of existing frequently occurring MPs in the DSN to
is given by AV = {φ1 , φ2 , . . . }, whilst the attribute set make recommendations. A particular challenge of
for E is given by AE = {ε1 , ε2 , . . . }. Note that F and finding frequently occurring MPs in DSNs is the size
T subscribe to the same attribute set because they are of the networks to be considered. The exemplar
both movement network vertices, and every vertex (at dataset used for the evaluation reported on later in this
least potentially) can be a “from” or a “to” vertex in paper comprised 548,395 vertices and some 3.5 mil-
the context of MPM. Each attribute in AV and AE also lion edges. In other words we cannot mine and main-
has a value domain associated with it. tain all the MPs that might feature in the data set. Note
Any given network can also be represented as a that although the number of MPs generated can be re-
set of tuples of the form hF, E, T i (Equation 1). In duced by using a high σ threshold this is undesirable
other words a given network can be encapsulated in as we need to use a low σ threshold so as to ensure
the form of a dataset D = {F1 , F2 , . . . }, where each no significant MPs are missed (the most appropriate
Fi ∈ D is a FET. An MP is then a FET that occurs value for σ will be considered in Section 6). The so-
frequently in D, where frequency is defined in terms lution is to mine MPs as required with respect to a
of a frequency threshold σ, a percentage value be- specific user and to consequently generate recommen-
tween 0.0 and 100.0 indicating the proportion of the dations with respect to that specific user. Users would
number of occurrences of a particular MP with re- be considered in turn, but recommendations would be
spect to the total number of records (edges) in the made periodically. It would therefore not be neces-
data set, or data set segment, under consideration. sary to consider all DSN users in one processing run.
In the context of DSNs the sets F and T represent In addition, by mining MPs on a required basis, the
DSN user profiles, while the set E represents the na- continuously evolving (dynamic) nature of DSNs can
ture of the reciprocal messaging between users. A MP be taken into account.
is then a frequently occurring FET that encompasses The pseudo code for the RecoMP process is pre-
a pair of user profiles and reciprocal messaging be- sented in Algorithm 1. The inputs are: (i) a given user
haviour. Further details concerning MPs and FETs profile unew , (ii) the set of all user profiles U, (iii) the
can be found in found in (Al-Zeyadi et al., 2016) and DSN represented as a dataset D comprised of a set
(Al-Zeyadi et al., 2017). of FETs (as described above), and (iv) a desired sup-
port threshold σ. Note that for illustrative purposes,
3.3 Problem Statement in Algorithm 1, we have assumed a new user, but this
could equally well be an existing user for whom a new
In the context of the work presented in this paper a set of recommendations is to be generated. The out-
dating network G is defined in terms of a tuple of the put is a set R of recommended users (matches). In-
form hV, Ei, where V is a set of vertices representing spection of the algorithm indicates that it comprises
the users of the DSN and E is the set of edges rep- two sub-processes: (i) Mining (lines 7 to 21) and (i)
resenting reciprocal communication between users. Recommendation (lines 22 to 28).
Each vertex vi ∈ V is defined by a set of attribute val- The mining sub-process is where the relevant
ues representing the profile of the user. In the case MPs are generated. MPs are stored in a set M =
of the dataset used for the evaluation purposes, as re- {hMP1 , count1 i, hMP1 , count1 i, . . . }. On start up (line
Input: port MP extraction. A shape is a MP template (proto-
1 unew = new joined user profile vector type) with a particular configuration of attributes tak-
2 U = Collection of all user profile vectors ing from the attribute sets AV and AE without con-
3 D = Collection of FETs {r1, r2, ...} sidering the associated attribute values. Once gener-
describing network G ated shapes can be populated with attribute values to
4 σ = Support threshold give candidate MPs. The idea is to enhance the ef-
Output: ficiency of calculating MPs by considering potential
5 R = Set of recommended users MPs in terms of the attributes they might contain, as
6 Start: oppose to the individual attribute values they might
7 Mining Part: contain, given that size of the set of attributes will be
8 M = 0/ less than the size of the concatenated set of attribute
9 Dnew = Pruning D by looping through D and values. The maximum number of shapes that can ex-
considering only FETi where F or T similar to ist in Dnew is given by Equation 3, where |Av | and |AE |
unew are the number of vertex and edge attributes that fea-
10 ShapeSet = the set of possible shapes ture in Dnew .
{shape1 , shape2 , . . . } (2|Av | − 1) × (2|AE | − 1) × (2|Av | − 1) (3)
11 forall the shapei ∈ ShapeSet do
Returning to algorithm 1 the next step is to popu-
12 forall the r j ∈ Dnew do
late the set of generated shapes (lines 11 to 18). For
13 if r j matches shapei then
each shape shapei in the shape set, and for each FET
14 MPk = MP extracted from r j
(record) r j in Dnew , if r j matches shapei r j it is tem-
15 if MPk in M S then increment support
porarily stored in a variable MPk . Note that a record
16 else M = M hMPk , 1i
r j matches a shapei if the attributes featured in the
17 end
shape also feature in r j . If MPk is already contained in
18 end M we increment the associated count (line 15), other
19 forall the MPi ∈ M do wise we add MPk to M with a count of 1. Once all
20 if count for MPi < σ then remove MPi shapes have been processed we loop through M (lines
from M 19 to 20) and remove all MPs whose support count is
21 end less than σ.
22 Recommendation Part: When the set of frequently occurring MPs has
23 forall the ui ∈ U do been generated the recommender sub-process is com-
24 forall the MPj ∈ M do menced (line 22). For each MP MPj in M, and each
25 if ui ⊆SMPj and ui * R then user profile (vertex) ui in U, if ui is a subset of either
26 R = R ui the From or To part of MPj , and has not previously
27 end been recorded in R, ui is appended to R (line 26). In
28 end this manner a set of recommended users is generated.
Algorithm 1: The RecoMPA Algorithm Note that shape based approach to MP min-
ing described above lends itself to parallelisation.
Each shape can be populated and the various re-
8) M is set to the empty set 0. / The sub-process com- sulting MP instances counted on a separate process-
mences (line 8) by pruning D to create Dnew (Dnew ⊂ ing unit without requiring any messaging between
D) so that we are left with a set of FETs where either units. Technologies such as Map Reduce (MR) on
the From and/or the To part correspond (are similar) a top of Hadoop (Dean and Ghemawat, 2008) or the
to unew . This benefit of this pruning is that it results in well known Massage Passing Interface (MPI) (Gropp
a significantly reduced search space. Similarity mea- et al., 1999) would be appropriate here as discussed
surement was conducted using the well known Co- in (Al-Zeyadi et al., 2017).
sine similarity metric calculated as shown in Equation
2 where A and B are the set of attribute values of a
newly joined user, and a selected user in the network, 5 Recommendation System Based on
respectively. Collaborative Filtering (RecoCF)
∑ni=1 A.B
Similarity = cos(Θ) = p 2p 2 (2) To evaluate the proposed RecoMPA algorithm de-
∑ni=1 Ai ∑ni=1 Bi scribed above a benchmark algorithm was required.
As noted in Section 3, the majority of User-to-User
Next (line 9) a “shape set” is generated to sup- DSN recommendation systems are founded on (graph
based) Collaborative Filtering (CF) approaches (Tu 6 Evaluation
et al., 2014; Krzywicki et al., 2014). A benchmark This section reports on the evaluation conducted
CF based DSN recommendation algorithm was there- with respect to the proposed RecoMP algorithm. The
fore developed, the RecoCF algorithm. The general evaluation was conducted using a FET database ex-
methodology of Collaborative Filtering, for any sys- tracted from a dataset obtained from the Jiayuan.com
tem, can be described in two steps: DSN. The objectives of the evaluation were to com-
1. Identify users who share the same vector pattern pare the operation of the proposed MP based RecoMP
with the service user (the user whom the predic- algorithm in comparison with standard Collaborative
tion is for). Filtering (the RecoCF algorithm from Section 5). The
metrics used for the evaluation were: (i) Recall (R),
2. Use the preferences of those users founded in step
(ii) Precision and (iii) F-score (F).
1 to create a prediction (recommendation) for the
service user. Table 1: TCV results using the RecoMP algorithm
The same methodology was adopted with respect Tenth
RecoMP
to the purpose built CF based DSN recommendation P R F
RecoCF algorithm. The pseudo code for the RecoCF #1 0.938 1.000 0.978
algorithm is presented in Algorithm 2. As in the case #2 0.908 1.000 0.949
of RecoMP algorithm, the RecoCF algorithm takes #3 0.917 1.000 0.956
the same input except there is no need for a σ thresh- #4 0.948 1.000 0.972
#5 0.948 1.000 0.972
old. The output, as before, is a set of recommended
#6 0.948 1.000 0.972
users R. The algorithm commences (line 6), as in #7 0.928 1.000 0.961
the case of the RecoMP algorithm, by pruning the #8 0.928 1.000 0.961
dataset D to give Dnew . Then for all records (FETs) #9 0.952 1.000 0.974
in Dnew the From and To attribute value sets are ex- # 10 0.867 1.000 0.917
tracted (lines 8 and 9), the sets Fromi and Toi . If Avarage 0.928 1.000 0.961
Fromi is a subset of unew (the new user profile) the SD 0.02 0.00 0.02
user profile associated with Fromi is added to R if it
has no already been included. Similarly if Toi is a Table 2: TCV results using the RecoCF algorithm
subset of unew the user profile associated with Toi is Tenth
RecoCF
added to R, again provided if has not already been in- P R F
cluded. The result is a set R of recommended users #1 0.217 0.764 0.298
(matches). #2 0.369 0.831 0.470
#3 0.325 0.760 0.416
#4 0.305 0.722 0.364
Input: #5 0.305 0.722 0.364
1 unew = new joined user profile vector #6 0.305 0.722 0.364
2 U = Collection of all user profile vectors #7 0.333 0.756 0.424
3 D = Collection of FETs {r1, r2, ...} #8 0.354 0.763 0.416
describing network G #9 0.265 0.683 0.361
Output: # 10 0.446 0.717 0.439
4 R = Set of recommended users Avarage 0.322 0.744 0.392
5 Start: SD 0.058 0.038 0.048
6 Dnew = Pruning D by looping through D and
considering only FETi where F or T similar to 6.1 Data Sets
unew
7 forall the Di ∈ Dnew do For the conducted evaluation reported on in this pa-
8 Fromi = return From part from Di per a dataset was obtained from Jiayuan.com3 . Ji-
9 Toi = return To part from Di ayuan.com is the most popular DSN in China; in 2011
10 if From Si ⊆ unew and Fromi * R then it was reported t have 40.2 million subscribers (users),
11 R = R Toi and 4.7 million active monthly subscribers. The data
12 else if SToi ⊆ unew and Toi * R then obtained comprised 548, 395 users (344, 552 men and
13 R = R Fromi 203, 843 women) and details concerning whether a
14 end user had messaged another (no information quantify-
Algorithm 2: The RecoCF Algorithm ing the messaging activity was available). Each user
3 http://www.jiayuan.com
2000 sets of experiments were conducted. The comparison
1900
1800
Male was conducted using a variation of Ten Cross Vali-
Female
1700 dation (TCV) whereby the entire Jiayuan.com FET
1600
1500 database was divided into tenths and the process run
1400 ten times with a different tenth used for testing. More
1300
1200 specifically for each run a random sample of ten users
Number of users
1100
1000
was extracted from the testing tenth and used for the
900 evaluation. In this manner the process of TCV could
800
700
be conducted without processing all 548, 395 vertices
600 represented in the database. For both sets of experi-
500
400 ments a threshold value of σ = 1.0 was used.
300
200
The results are given in Tables 1 and 2, Table 1
100 gives the results using the RecoMP algorithm while
0
Table 2 gives the results using the RecoCF algorithm.
~100 ~600 ~1200 ~1800 ~2400 ~3100 ~3700 ~4300 ~5100 The tables give the average Precision (P), Recall (R)
Number of messages sent bins and F-score (F) for each tenth, and a total average and
Standard Deviation (SD).
Figure 2: Male and Female Normal Distribution.
had a profile and a set of preferences associated with From the above it can clearly be seen that the
it. Unlike European or US DSNs, Jiayuan.com, in line recommendations made using the RecoMP algorithm
with other Chinese DSNs, is directed at the (hetero- are better than those generated using Collaborative
sexual) marriage market rather than the shorter term Filtering (the RecoCF algorithm). The total aver-
relationship market, and thus user profiles tend to re- age recall, precision and F-score using RecoMP were
flect this; profiles comprise: age, height, education, 0.928, 1.000 and 0.961; compared to total average
location, occupation, place of work, income, home recall, precision and F-score values of 0.322, 0.744
ownership, car ownership and so on. Preferences in- and 0.392 using RecoCF with small SD values were
clude things like: age range, height range, education recorded. It is also interesting to note that the total
and location. The data set was processed firstly so average precision using RecoMP, as before, was fre-
that each user was defined by a set of 25 (profile and quently 1.000; meaning we often make all the correct
preference) attributes, thus |Av | = 25. It was then recommendations and no incorrect recommendations.
processed again so as to generate a network where
the vertices represented users. Edges where included
wherever two users had messaged each other, in other
7 Conclusion
words the messaging was reciprocal, thus |Ae | = 1 In this paper, the authors have proposed a rec-
with only a single value. Unfortunately the nature of ommendation system, directed at Dating Social Net-
the data set was such that we could not extract a more works (DSN), founded on the concept of Movement
comprehensive edge attribute set. Converting this net- Patterns (MP), patterns that capture the nature of traf-
work into a FET database resulted in a database com- fic movement between vertices in networks. The idea
prising 3, 311, 076 records. The normal distribution is to extract frequently occurring MPs from a current
of the users’ activity, in terms of the number of mes- network and use these to inform a User-to-User rec-
sages sent, is presented in Figure 2. From the figure ommender DSN system. The idea was built into an
it can be seen that the majority of users sent 100 mes- algorithm, the RecoMP algorithm, and tested by com-
sages over the considered time frame. Given a new paring the operation of this algorithm with a Collab-
user (or an existing user for whom we wish to make orative Filtering approach, RecoCF algorithm. For
a recommendation), if we find a frequent MP within the evaluation a large network, extracted from Ji-
the existing network where either the From or To part ayuan.com DSN system, comprising 3,311,076 ver-
matches the description (profile) of the new user we tices (users) was used. Excellent results were pro-
recommend the associated existing users to the new duced, a best total average F-score value of 0.961 was
user. obtained using the RecoMP algorithm compared to a
value of 0.392 using the RecCF algorithm. However,
6.2 Performance Effectiveness of for general applicability to large DSN, the efficiency
RecoMP with respect to RecoCF of the approach needs to be improved. A potential
avenue for future work is thus to investigate the po-
To determine the effectiveness of the proposed Re- tential for using some form of parallel processing, for
coMP algorithm, in comparison with RecoCF, two example using the well known Massage Pass Inter-
face (MPI) or Hadoop/MapReduce. One of the advan- Kutty, S., Nayak, R., and Chen, L. (2014). A people-
tages offered by the “Shape” based approach to min- to-people matching system using graph mining tech-
ing MPs, as proposed in this paper, is that it lends it- niques. World Wide Web, 17(3):311–349.
self to parallelisation, potentially each possible shape Li, J., Peng, W., Li, T., Sun, T., Li, Q., and Xu, J. (2014).
can be processed using a separate processing unit. Social network user influence sense-making and dy-
namics prediction. Expert Systems with Applications,
41(11):5115–5124.
ACKNOWLEDGEMENTS Lin, W., Alvarez, S. A., and Ruiz, C. (2002). Efficient
The authors would like to thank the China Uni- adaptive-support association rule mining for recom-
mender systems. Data Mining and Knowledge Dis-
versity of Science and Technology, and the School
covery, 6(1):83–105.
of Statistics at the Renmin University of China Sta-
tistical Centre, for providing the jiayuan.com dataset Oh, H. J., Ozkaya, E., and LaRose, R. (2014). How does
online social networking enhance life satisfaction? the
used for evaluation purposes in this paper. Also, the relationships among online supportive interaction, af-
first author would like to thank the Iraqi Ministry of fect, perceived social support, sense of community,
Higher Education and Scientific Research, and Uni- and life satisfaction. Computers in Human Behavior,
versity of Al-Qadisiyah, for funding this research. 30:69–78.
Pizzato, L., Rej, T., Akehurst, J., Koprinska, I., Yacef, K.,
and Kay, J. (2013). Recommending people to peo-
ple: the nature of reciprocal recommenders with a
REFERENCES case study in online dating. User Modeling and User-
Adapted Interaction, 23(5):447–488.
Agrawal, R., Srikant, R., et al. (1994). Fast algorithms for
Pizzato, L., Rej, T., Chung, T., Koprinska, I., and Kay, J.
mining association rules. In Proc. 20th int. conf. very
(2010). Recon: a reciprocal recommender for online
large data bases, VLDB, volume 1215, pages 487–
dating. In Proceedings of the fourth ACM conference
499.
on Recommender systems, pages 207–214. ACM.
Al-Zeyadi, M., Coenen, F., and Lisitsa, A. (2016). Min- Resnick, P. and Varian, H. R. (1997). Recommender sys-
ing frequent movement patterns in large networks: A tems. Communications of the ACM, 40(3):56–58.
parallel approach using shapes. In Research and De-
velopment in Intelligent Systems XXXIII: Incorporat- Sandvig, J. J., Mobasher, B., and Burke, R. (2007). Ro-
ing Applications and Innovations in Intelligent Sys- bustness of collaborative recommendation based on
tems XXIV 33, pages 53–67. Springer. association rule mining. In Proceedings of the 2007
ACM Conference on Recommender Systems, RecSys
Al-Zeyadi, M., Coenen, F., and Lisitsa, A. (2017). On ’07, pages 105–112, New York, NY, USA. ACM.
the mining and usage of movement patterns in large
traffic networks. In Big Data and Smart Computing Schafer, J., Frankowski, D., Herlocker, J., and Sen, S.
(BigComp), 2017 IEEE International Conference on, (2007). Collaborative filtering recommender systems.
pages 135–142. IEEE. The adaptive web, pages 291–324.
Cai, X., Bain, M., Krzywicki, A., Wobcke, W., Kim, Y. S., Sohail, S. S., Siddiqui, J., and Ali, R. (2013). Book recom-
Compton, P., and Mahidadia, A. (2010). Learning mendation system using opinion mining technique. In
collaborative filtering and its application to people to Advances in Computing, Communications and Infor-
people recommendation in social networks. In Data matics (ICACCI), 2013 International Conference on,
Mining (ICDM), 2010 IEEE 10th International Con- pages 1609–1614. IEEE.
ference on, pages 743–748. IEEE. Tu, K., Ribeiro, B., Jensen, D., Towsley, D., Liu, B., Jiang,
Dean, J. and Ghemawat, S. (2008). Mapreduce: simplified H., and Wang, X. (2014). Online dating recommenda-
data processing on large clusters. Communications of tions: matching markets and learning preferences. In
the ACM, 51(1):107–113. Proceedings of the 23rd International Conference on
World Wide Web, pages 787–792. ACM.
Gropp, W., Lusk, E., and Skjellum, A. (1999). Using MPI:
Wang, X. and Wang, Y. (2014). Improving content-based
portable parallel programming with the message-
and hybrid music recommendation using deep learn-
passing interface, volume 1. MIT press.
ing. In Proceedings of the 22nd ACM international
Krzywicki, A., Wobcke, W., Kim, Y. S., Cai, X., Bain, M., conference on Multimedia, pages 627–636. ACM.
Compton, P., and Mahidadia, A. (2014). Evaluation Xia, P., Zhai, S., Liu, B., Sun, Y., and Chen, C. (2016). De-
and deployment of a people-to-people recommender sign of reciprocal recommendation systems for online
in online dating. In AAAI, pages 2914–2921. dating. Social Network Analysis and Mining, 6(1):1–
Kunegis, J., Gröner, G., and Gottron, T. (2012). Online dat- 16.
ing recommender systems: The split-complex num-
ber approach. In Proceedings of the 4th ACM Rec-
Sys workshop on Recommender systems and the social
web, pages 37–44. ACM.