Paper 338

You might also like

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 2

Summary: The authors propose a novel relationship expression language that is

generic and robust enough to represent exactly same semantic relationship between a
pair of nodes across diverse data representation strategy for the same database.
Then they propose an algorithm called RelSim, which extends a state-of-the-art
metapath based similarity search algorithm (PathSim), which is designed to ensure
that it returns same similarity score between all entity pairs across diverse data
representations for the same database. Experiments have been run on large scale
datasets and the empirical results, when compared against state-of-the-art
algorithms currently available, show that the proposed method RelSim outperforms
the baseline.

Pros:
- The problem statement appears to be quite novel.

Cons:
- To the best of my understanding, the authors have currently targetted the most
popular and very intuitive constraints (as mentioned in Section 6.1). Do they have
some intuition of datasets/scenarios where it would be hard/impossible to deal with
some of these constraints ? (Basically does this approach generalize to all
possible datasets or are there scenarios where this will NOT work as-is?)

- In some cases, the diversity of representation for the same data might be a
deliberate design choice made by the respective database creator, with view to some
potential end-applications. It will be great if the authors discuss a bit more on
this topic - the end-applications where this representation agnostic similarity
search algorithm will be a better fit rather than existing state-of-the-art (like
PathSim etc.).

Miscellaneous:
- Not an expert in this domain and hence not familiar with the baseline algorithms
either.
- Did NOT check any proofs.

----

Saket:

My main concern with this paper is with the assumption that two databases should
have "exact same information" (due to bijection between nodes of two datasets).
Such kind of databases do not exist in real world. In experiments, authors
transforms the datasets to show the robustness and effectiveness of their proposed
algorithm. Can the authors modify the proposed algorithm such that the algorithm
can work on two relatively similar datasets? In such case, a slight drop in score
of "average ranking differences" would still be acceptable.

Authors can also try the existing graph embedding based algorithms like DeepWalk
[1] or role-based embedding algorithms[2] as baselines

[1] Perozzi, Bryan, Rami Al-Rfou, and Steven Skiena. "Deepwalk: Online learning of
social representations." Proceedings of the 20th ACM SIGKDD international
conference on Knowledge discovery and data mining. ACM, 2014.
[2] Nesreen K. Ahmed, Ryan A. Rossi, Rong Zhou, John Boaz Lee, Xiangnan Kong,
Theodore L. Willke, and Hoda Eldardiry. "Learning Role-based Graph Embeddings."
StarAI IJCAI 2018.
-----------------------------------------------------------------------------------
--

Nikhita:

- The problem of making simple modifications to current algorithms, in order to


obtain the same or similar accurate data analytics results for various databases is
interesting and important.

- Their assumption of only using dataset pairs with a bijective mapping between
them seems highly restrictive. They mention that robustness of an algorithm can't
be guaranteed in case of non-bijective transformations. However, a small section
empirically showing what would happen under a few such exemplars, even on a
synthetic dataset would be useful.

- It is not clear how the authors generate and choose relationship patterns based
on the database constraints to test their method. A general rule of thumb under the
assumptions of this work would be useful.

- It might be interesting to see the performance of their technique against more


recent [4], highly scalable [3] path-based similarity search algorithms such as [3]
and [4], since efficiency is also a contribution of this work.

References
[3] Dong, Y., Chawla, N.V. and Swami, A., 2017, August. metapath2vec: Scalable
representation learning for heterogeneous networks. In Proceedings of the 23rd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 135-
144). ACM.
[4] Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y. and Li, J., 2015, August.
Panther: Fast top-k similarity search on large networks. In Proceedings of the 21th
ACM SIGKDD international conference on knowledge discovery and data mining (pp.
1445-1454). ACM.

You might also like