Professional Documents
Culture Documents
Simplifying Mashup Component Selection With A Combined Similarity-And Social-Based Technique
Simplifying Mashup Component Selection With A Combined Similarity-And Social-Based Technique
Simplifying Mashup Component Selection With A Combined Similarity-And Social-Based Technique
n
S,A
i
n
S
, n
S
= 0;
0, n
S
= 0,
(4)
where n
S,A
i
is the number of mashups in which S and A
i
are
used together, and n
S
is the number of mashups in which S
appears.
3.4 Iterative Web API Discovery
In this section we describe how we exploit the social infor-
mation to improve the Web API discovery process. First, we
assume that mashup composers have good practices regard-
ing composite applications. Then, they have divided their
problem into a set of subproblems or functionalities that can
be satised by dierent Web APIs. Because discovering and
selecting APIs are iterative processes, at each step the com-
poser can constraint the current search with the decisions
already made. It is important to mention that even when
this approach is intended to support the mashup building
process it can be used also to discover specic APIs and/or
mashups. We can distinguish the composer intention by the
information supplied:
CASE 1: If composer only provides mashup keywords,
we interpret it as he is trying to nd mashups that
provide the specied capabilities.
CASE 2: If composer only provides API keywords, we
interpret it as he is trying to nd APIs that provides
the specied functionalities.
CASE 3: If composer provides mashup and API key-
words, we interpret it as he is trying to nd APIs that
provide the specied functionalities and that has been
used on a specic type of mashup. Besides, in later
stages, the composer could have a selected subset of
APIs that has to be considered as a constraint to the
new discovery process.
This process is enriched at each step with the social infor-
mation about which APIs have been previously used by the
community.
In Algorithm 1, we present how the discovery process is
driven according to the inputs of the composer. The seman-
tic rank is always calculated in one way but the social rank
is slightly dierent depending on (1) the already selected
APIs and (2) if the composer denes a mashup context (set
of keywords K
M
) for which the API is needed. If a context
is specied, both WAR and CAR are calculated not over the
entire set of mashups, but only over the extent of concept
C
M
(obtained from the set of keywords K
M
as explained in
2.1). These are the local versions.
Algorithm 1 Iterative Web API discovery
Require: Let M the set of all mashups.
Require: Let K
M
= {t
1
, ..., t
n
} the set of keywords that
dene the type of mashup the composer wants to build.
Require: Let K
A
i
= {t
1
, ..., t
m
} the set of keywords that
dene the type of API the composer searches at step i.
Require: Let I the number of APIs that will comprise the
mashup.
Require: Let S the initial empty set of selected APIs.
1: for i = 1 to I do
2: Remove stop words from K
A
i
3: Stem K
A
i
4: Using K
A
i
obtain the API category C
A
which intent
is closest to K
A
i
as explained in section 2.1
5: Get the APIs C
A
= {a
1
, ..., a
K
}
6: for k = 1 to K do
7: Calculate semantic rank R
k
given the frequency ma-
trix of the API - terms
8: if K
M
= then
9: Let n
k
the number of mashups m M in which
API
k
is used
10: Let n
max
= max
1kK
n
k
11: Calculate global WAR of API
k
as WAR
G
k
=
n
k
n
max
12: if S = then
13: Let n
S,k
the number of mashups m M in
which S and API
k
are used together
14: Let n
S
the number of mashups m M in
which S appears
15: Calculate global CAR of API
k
as in (4)
CAR
G
S,k
=
n
S,k
n
S
, n
S
= 0;
0, n
S
= 0,
16: Calculate the social rank of API
k
as SR
k
=
(WAR
G
k
+CAR
G
S,k
)
2
17: else
18: Calculate the social rank of API
k
as SR
k
=
WAR
G
k
19: end if
20: else
21: Using K
M
obtain the mashup concept C
M
which
intent is closest to K
M
as explained in section 2.1
22: Let n
k
the number of mashups m C
M
in which
API
k
is used
23: Let n
max
= max
1kK
n
k
24: Calculate local WAR of API
k
as WAR
L
k
=
n
k
n
max
25: if S = then
26: Let n
S,k
the number of mashups m C
M
in
which S and API
k
are used together
27: Let n
S
the number of mashups m C
M
in
which S appears
28: Calculate local CAR of API
k
as in (4)
CAR
L
S,k
=
n
S,k
n
S
, n
S
= 0;
0, n
S
= 0,
29: Calculate the social rank of API
k
as SR
k
=
(WAR
L
k
+CAR
L
S,k
)
2
30: else
31: Calculate the social rank of API
k
as SR
k
=
WAR
L
k
32: end if
33: end if
34: Calculate the nal rank FR
k
of API
k
as FR
k
=
SR
k
+ (1 ) R
k
35: end for
36: The user selects one API, adding it to S, probably the
one with highest nal rank FR
k
37: Suggest the set of APIs that have been co-utilized
with S. One of these APIs can also be selected at this
step, then i must be incremented in the number of
APIs that could be selected.
38: end for
Figure 5: MashupReco architecture
3.5 Implementation
In order to show empirically our results we have built the
MashupReco prototype web tool that allows composers to
perform an iterative API discovery process. Its architecture
is depicted in Figure 5. The crawler component is designed
to gather data from multiple catalogs. Currently, it only
supports the ProgrammableWeb catalog. The data is stored
in a MySQL engine database. Using the Social Engineand
theTaxonomy Builderwe perform the social network anal-
ysis and generate the taxonomies. The Taxonomy Builder is
built over the Coron and the TreeTagger System. The most
important module is the Mashup Discovery Engine which
implements the iterative Web API discovery algorithm. Be-
cause the functionalities of MashupReco are exposed as web
services, they could be consumed by dierent applications
and build dierent presentations for it
5
. In Figure 5 we
show a basic interface to support composers in the discov-
ery process. The parameter allows composer to calibrate
how much weight assign to the social inuence. For s closer
to 1 the composer gives more importance to the social inu-
ence, against s closer to 0 meaning that the social inuence
is less important.
4. CASE STUDY
In this section, we describe MashupReco with an experi-
ment. Here, the composer, a real estate broker, requires to
build a web site that mashes up dierent sources of informa-
tion regarding houses on sale given a specic location and
its perimeter. He is interested on displaying over a map the
dierent housing options, their photos and videos (if they
exist), photos and/or videos of near places of interest such
as schools, restaurants, tness centers, to name a few. As-
suming that our composer user has good practices, he will
be able to identify which kind of APIs he needs. Actually, he
already identied that he needs a map to display and mash
up the dierent sources of information. He needs also APIs
capable of searching videos and photos at a specic loca-
tion. Probably, he needs an API to convert an address into
a latitude/longitude pair to obtain the photos and videos of
5
http://dev.toeska.cl/mashup-reco
interesting points as well as the housing options. He wants
to support his potential customers to get an impression of
the neighborhood where the house is located, then he also
needs an API that could extract information about what is
people saying about this place (probably comments from a
social network).
Now, the composer needs to nd APIs to build the mashup.
Using MashupReco he rst species the mashup context,
this is a mashup about map and real estate. Then,
He searches an API to nd geo-located photos using
the keywords photo and location. The results are
immediate:
Ranked by the similarity technique: Glosk, In-
stagram Real-time and Steply are highly ranked
(0.97, 0.80 and 0.76, respectively). Using only
the social inuence, he obtains Microsoft Virtual
Earth, Flickr, Yahoo Maps (with global WARs of
1.0, 0.8, and 0.62, respectively). Using the com-
bined ranking with an of 0.3, Glosk and Flickr
are the highest ranked APIs (0.679 and 0.677 re-
spectively). As we can notice, their ranks are al-
most the same, then using one or the other seems
to be a good option. But the global WAR of Glosk
is 0 which means that it has never been used in a
mashup against the global WAR of Flickr of 0.8.
Based on the previous results he decides to use Glosk.
Glosk does not have any Co-APIs, then MashupReco
cannot suggest APIs according to this criteria.
Then, he searches video APIs using the keywordvideo.
According to the combined rank, the APIs with
highest ranking are YouTube (0.78), Yahoo Video
Search (0.70) and Patricks Aviation (0.66).
Then the user selects YouTube against Yahoo Video
Search (WARs of 0.84 and 0.01, respectively).
Based on the previous selection, MashupReco com-
putes the list of Co-APIs along with their CARs,
containing Google Maps 0.86, Flickr 0.33, Twit-
ter 0.12, Weather Channel 0.10, Wikipedia 0.10,
Foursquare 0.07, Yahoo Geocoding 0.03, to name
a few.
From the Co-APIs list, the composer selects Google
Maps as the mapping visualization, Twitter as the
source of what is people saying about the neighbor-
hood and Yahoo Geocoding as the API to obtain the
geo-location given an address. Each time the composer
selects an API, the Co-APIs list is recalculated.
Using MashupReco, we can balance results matching de-
scriptions and community usage of APIs. The majority of
APIs does not have social information to exploit because
only a fraction of them has been utilized in a mashup. This
lack of social data, can lead to a problem because there is no
way to rate an API based on its usage. On the other hand, if
there are APIs that have been extensively used in mashups,
exists the possibility of rating them too high and giving them
too much exposure, leaving the rest in the bottom of the list.
That is the reason behind the idea of inuencing the discov-
ery process with social data, rather than basing on it. This
is controlled by the alpha factor. For example, for the query
photo location, a highly used API such as Flickr appears
under Glosk, an API that has not been used in any mashup,
but is more specic to the query.
The Co-APIs shows a list of APIs that has been used
in collaboration with the selected ones in mashups of the
context. In the case of selecting Youtube, APIs of dierent
functionalities are suggested: Google Maps, Geonames and
Twitter are some of them. Based on the context, these APIs
could be useful for the mashup under construction based on
previous compositions made by other users.
5. RELATED WORK
Given the increasing trend of major rms providing APIs
for public use, mashup community is rapidly expanding.
There are studies that characterize the mashup ecosystem as
a API-Mashup network [10] which intended to exploit this
information.
In [8], the authors proposed the serviut score to rank APIs
based on their utilization and popularity. To calculate the
serviut score they also considered the number of mashups
that use the given API but also other aspects that we be-
lieve are too ambiguous to be considered, such as classifying
mashups in the same category as the API. Even according
to ProgrammableWeb, mashups are not classied in cate-
gories because by denition a mashup is a mix of dierent
Web APIs, therefore is quite dicult to classify them in
functional categories. According to our experiments, the
taxonomy of APIs and mashups are quite dierent.
In [3], authors proposed a social technique to mine an-
notated tags of mashups and APIs in order to recommend
mashup candidates managing the cold start problem for new
competitors. But tags are not reusable between dierent
catalogs. Then, by using tags we obtain specic taxonomies
that are not generic enough to be used inter-sites. Web API
authors do not necessarily use the same tags to describe their
APIs, they typically adapt them according to the tags that
are used on each catalog.
In [2], authors proposed MashupAdvisor, that also assist
mashup creators to build mashups. Similar to our approach,
MashupAdvisor suggests APIs that could be part of the
mashup under construction using a probabilistic approach
based on the popularity in the mashup repository. But be-
cause MashupAdvisor assists the mashup building process
instead of only the selection, this approach is based on spe-
cic inputs and outputs. But typically only Web service
APIs have this data. Mostly because of their complexity
and lack of standards, general APIs do not have interface in-
formation of each operation. Then, this approach performs
well over Web services but not over general Web APIs. Even
when the results are encouraging they actually simulate the
data of ProgrammableWeb to conduct the experiments.
In [13], authors proposed ServiceRank to dierentiate ser-
vices from a set of functional-equivalent services based on
their quality and social aspects. The problem is that it needs
to access data that maybe providers are not willing to give,
such as the response time and availability measurements.
Also, because providers publish their own measurements,
this process could be not completely reliable.
In [4] authors proposed MatchUp, a tool that supports
mashup creators to locate components to mash up based on
the current component selection and a complete database
that describes which components have been used in the dif-
ferent mashups (at level of input/output). The algorithm
performs well but is only feasible at level of intra organiza-
tion because, in general, this information is not shared or
public.
6. FUTURE WORK
Every day, at least two new APIs are created. The mar-
ket is also changing according to the needs of customers.
Therefore, is expected that the communities already identi-
ed change their structure, new APIs (or mashups) join the
communities or leave them. The evolution is imminent, then
over time we expect that some of these communities merge
or split into more specialized sub-communities. We are cur-
rently working on determining evolution patterns using this
community abstraction, as well as modeling this evolution.
Also, the social ranks (WAR and CAR) are aected with
this dynamism, and have to reect variations on the usage
of APIs, e.g. APIs with an intense use in a short period
of time and then experimenting a decrease. On the other
hand, we are researching techniques that will allow us to
incrementally update the taxonomy each time the commu-
nities change enough to trigger a taxonomy adaptation.
7. CONCLUSIONS
In this work, we have presented an approach that com-
bines both the semantic and social networks to enrich and
improve the Web API discovery process in order to build a
mashup. We have shown empirically that using natural lan-
guage descriptions of the objects can be used eectively to
build taxonomies of functionalities, presented as communi-
ties. Also, we have shown how we can build a collaborative
social network of APIs by using the analogy of agents that
collaborate to create applications, and also how we can ex-
ploit and leverage the semantic ratings.
One of our main contributions was to reinforce this method-
ology with a Web tool that allows us to empirically show
this iterative process. This approach shows how we can
eectively mitigate the cold start problem and the prefer-
ential attachment trend of social approaches to recommend
APIs or mashups, and also how we eectively discover bet-
ter description-based candidates by leveraging social infor-
mation making a trade o between both worlds.
8. ACKNOWLEDGMENTS
This work was partially funded by FONDEF (grant D08i1155),
UTFSM DGIP (grant DGIP 241167 and PIIC) and CCTVal
(FB/22HA/10).
9. REFERENCES
[1] C. Carpineto and G. Romano. Exploiting the potential
of concept lattices for information retrieval with credo.
J. UCS, pages 9851013, 2004.
[2] H. Elmeleegy, A. Ivan, R. Akkiraju, and R. Goodwin.
Mashup advisor: A recommendation tool for mashup
development. In Web Services, 2008. ICWS 08. IEEE
International Conference on, pages 337 344, sept.
2008.
[3] K. Goarany, G. Kulczycki, and M. B. Blake. Mining
social tags to predict mashup patterns. In Proceedings
of the 2nd international workshop on Search and
mining user-generated contents, SMUC 10, pages
7178, New York, NY, USA, 2010. ACM.
[4] O. Greenshpan, T. Milo, and N. Polyzotis.
Autocompletion for mashups. Proc. VLDB Endow.,
2:538549, August 2009.
[5] C. Lindig. Fast concept analysis. In Working with
Conceptual Structures Contributions to ICCS 2000,
pages 152161. Shaker Verlag, 2000.
[6] G. R. G. Michael Weiss. Modeling the mashup
ecosystem: Structure and growth. In R & D
Management, pages 4049, 2010.
[7] M. F. Porter. An algorithm for sux stripping.
Program, 14(3):130137, 1980.
[8] A. Ranabahu, M. Nagarajan, A. P. Sheth, and
K. Verma. A faceted classication based approach to
search and rank web apis. In Proceedings of the 2008
IEEE International Conference on Web Services,
pages 177184, Washington, DC, USA, 2008. IEEE
Computer Society.
[9] C. Roth and P. Bourgine. Lattice-based dynamic and
overlapping taxonomies: The case of epistemic
communities. Scientometrics, 69(2):429447, 2006.
[10] J. W. Shuli Yu. Innovation in the programmable web:
Characterizing the mashup ecosystem. In ICSOC
2008, LNCS 5472, pages 136147. Springer-Verlag,
2009.
[11] R. Torres, B. Tapia, and H. Astudillo. Improving web
api discovery by leveraging social information. to
appear in the proceedings of the 9th IEEE
International Conference on Web Services, 2011.
[12] M. Weiss and S. Sari. Evolution of the mashup
ecosystem by copying. In Proceedings of the 3rd and
4th International Workshop on Web APIs and
Services Mashups, Mashups 09/10, pages 11:111:7,
New York, NY, USA, 2010. ACM.
[13] Q. Wu, A. Iyengar, R. Subramanian, I. Rouvellou,
I. Silva-Lepe, and T. Mikalsen. Combining quality of
service and social information for ranking services. In
Proceedings of the 7th International Joint Conference
on Service-Oriented Computing, ICSOC-ServiceWave
09, pages 561575, Berlin, Heidelberg, 2009.
Springer-Verlag.