CIN First Revision Comments

CIN/568197.
v1 Review Report
Subject Appropriateness of the Manuscript
The topic of this manuscript falls within the scope of Computational Intelligence and
Neuroscience
Recommendation
Consider After Minor Changes
First-Reviewer-Comments
1. English and organization should be improved substantially. Most of the sentences are too long
and read repetitive.
Justification: In revised manuscript, the author has improved the English grammar and
organization of the paper substantially.
In some cases a part of the equations are in bold which is not needed.
Justification: All the equations are corrected as per reviewer comment.
2. Equations should be properly placed, organized. The new terms have to be declared either in
appendix or at the start of the paper.
Justification: The authors have organized all equations properly and declared all new terms
initially before Introduction Section in revised manuscript.
3. The main novelty of this paper seem to be semantic similarity measure. However, the
evaluation of this approach is not very convincing. I would suggest the authors to compare their
method with Google Word2Vec. This means the authors can use Google Word2Vec in place of
their own similarity approach and can compare.
Justification:
In their proposed work, the authors have used semantic similarity notion to filter out the
irrelevant/noisy terms suggested by Borda rank aggregation scheme. The proposed semantic
filtering approach takes two input words, one from terms suggested by Borda Rank and other
from query terms, then finds semantic similarity between them. If the sementic similarity is
above a fixed threshold, then the Borda rank suggested term is used for expansion otherwise, the
term is filtered out.
According to author’s knowledge, Google Word2Vec takes single input word and returns
the set of words similar to the input word with the similarity score. Here, for author’s proposed
work point of view, it is not clear how Word2Vec will take two words as input and return the
similarity score between them. So the Google Word2Vec approach cannot be applied directly.
Author is planning to explore the use of Google Word2Vec in the query expansion approach in
future.
4. What about exploring concept level/commonsense level query expansion. Microsoft

researchers are already exploring that and their approach seems to be very powerful. There are
similar approaches for similarity measure exists in literature [2].
Justification: In this paper, the authors have tried to incorporate the concept level similarity to
some extent by using sementic similarity notion based on Wordnet Ontology. Next, in future
work the authors are planning to give the more focus on concept/commonsense level query
expansion by using similarity measure exists in the literature [2].
5. I would also suggest to take special care of sentiment [4] and emotion at the time of query
expansion.
Justification:
Handling sentiment and emotion at the time of query expansion is very interesting problem. In
this paper authors have tried to handle sentiment and emotions to some extent, by using concept
present in [4]. For this purpose, first sentiment words are selected from the user query using
background knowledge (SentiWordNet) and these words are expanded by adding other related
sentiment and emotional words.
6. The following papers are very relevant to the paper and should be cited -
[1] Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. "Distributed
representations of words and phrases and their compositionality." In Advances in neural
information processing systems, pp. 3111-3119. 2013.
[2] Poria, Soujanya, Alexander Gelbukh, Erik Cambria, Amir Hussain, and Guang-Bin Huang.
"EmoSenticSpace: A novel framework for affective common-sense reasoning." Knowledge-
Based Systems 69 (2014): 108-123.
[3] Cambria, Erik, Jie Fu, Federica Bisio, and Soujanya Poria. "AffectiveSpace 2: Enabling
affective intuition for concept-level sentiment analysis." In Twenty-Ninth AAAI Conference on
Artificial Intelligence, pp. 508-514. 2015.
[4] Poria, Soujanya, Erik Cambria, Gregoire Winterstein, and Guang-Bin Huang. "Sentic
patterns: Dependency-based rules for concept- level sentiment analysis." Knowledge-Based
Systems 69 (2014): 45-63.
Justification: All the four above mention papers have been added in the revised manuscript.
CIN/568197.v1 Review Report
Subject Appropriateness of the Manuscript

The topic of this manuscript falls within the scope of Computational Intelligence and
Neuroscience
Recommendation
Consider After Major Changes
Second-Reviewer-Comments
The approach proposed by the authors is reasonable. However, the following improvements are
essential.
(1) It is not clear whether the method given in (5) is the known best method for pseudo feedback.
Since the authors use TREC and FIRE data, if there is a track on pseudo feedback, these
conferences would provide the best known results on TREC and FIRE data sets. The authors
should provide their results with the known best results. The paper in (5) was published in 2008,
while the paper in (22) was published in 2012. Why the authors do not compare their method
with that in (22)?
Justification: In this paper, authors have used four different term selection/ranking methods for
ranking the unique terms obtained from PRF documents (initially retrieved top ranked docs).
Borda ranking scheme is then used for combining the ranks of the terms returned by all four
methods.
 According to literature , aguera et. al. (5) approach is best method related to author's
work, as this approach uses pseudo relevance feedback based query expansion, using
rank combination. This is the reason of using method (5) for comparison.
 The authors have used ad hoc type dataset of TREC-3 (including disc 1 & 2) and FIRE.
There is no track on pseudo feedback for ad hoc type dataset; while it is available for web
dataset.
 The paper of (5) uses different term selection methods, combining them with naive
combination approach. This proposed method, based on combining various term selection
methods, is related to author's proposed work, that’s why authors have compared their
work with (5). While the paper of (22) incorporates proximity information into the
Rocchio’s model, and proposes a proximity based Rocchio’s model that is not related to
author work that’s why authors have not compared their proposed approach with (22).
(2) the presentation, though logically presented, is not clear in technical details in certain
situations.
(A) equation (4) is unclear. The LHS is the KL score for a term t, while the RHS sums over all t?
Give an example to illustrate it.
Justification: In Eq. 4 RHS, the summation mark is redundant or error, and there was also a
problem in Eq. 5. Authors have removed all these errors and made these equations more clearly
in revised manuscript.
(B) what are the lasses in equation (13)? How are the relevant and the irrelevant documents
determined? Is there a collection of training documents and queries? How are they re lated to the
collection of documents and queries in the actual experiments?
Justification:
 Lasses ( ~) of t ( ~𝑡) means that term t does not occur, but in revised manuscript authors
have used the symbol ( 𝑡 ) instead of symbol ( ~𝑡).
 Relevant documents are the set of documents obtained in initial search (called PRF
documents) by using Okapi-BM25 function for a user query, and are represented by R.
Irrelevant documents are the set of documents left in initial retrieval search from the total
corpus for the same query, and are represented by C.
 Similarly for each query, there will be a set of initially searched retrieved documents
called relevant docs (PRF docs) and non-retrieved docs called irrelevant document set.
Some top ranked PRF docs are selected for a query, the unique terms of these docs are
selected, and these all unique terms are ranked by using IG term ranking method
discussed in Eq. (13). Top ranked terms based on IG value are selected for expanding the
original user query. This process is repeated for every query, separately.
(C) the description in section 3.4 mentions the use of meronomy and holomony, but the
algorithm in table 2 does not mention them.
Justification: In section 3.4, authors describe the basic idea of semantic relationship that can be
used from WordNet. But in proposed work, only synonym and hypernym relations are used that
are explained with the help of following example number 1 and authors have revised Section 3.4
in revised manuscript accordingly.
The following example shows the use of synonym and hypernym used in the algorithm.
Example no. 1: If two concepts/words are Mirror and Magician.
Hypernymy trees of words Mirror (w1) and Magician (w2) using WordNet are given in
following figure.
Based on proposed semantic similarity approach:

The least common subsumer (Lcs) of both w1 and w2 words is Whole.
The number of edges between two words through LCS is 11.
Now we apply Leacock & Chodorow (Lch) semantic similarity approach to find similarity-
  length w1,w2   
Sim  w1,w2   max   log  
lch  2D
  
Where, D is the maximum depth (i.e. 12 in case of English WordNet-1.2).
Length (w1,w2) is shortest path between word w1 and word w2.
D= 12, and length (Mirror - Magician) = 11
Now by applying Lch similarity measures:
𝑺𝒊𝒎𝒍𝒄𝒉 𝑀𝑖𝑟𝑟𝑜𝑟− 𝑀𝑎𝑔𝑖𝑐𝑖𝑎𝑛 = 𝟎. 𝟕𝟖

This similarity range varies from 0 to 4.
(D) equation (21) involves max, which should consider a set of multiple elements.
Justification: In Eq. (21), length (c1 , c2 ) is the number of nodes between concepts c1 and c2 in
WordNet graph. But there may be more than one paths between c1 and c2 in WordNet graph, so
there may be more than one values of length (c1 , c2 ) and in that case, there will be more than one
similarity values. Therefore, max is used to find maximum similarity value from a set of
similarity values.
(E) in equation (22), should c be a single candidate term instead of a set?

Justification: yes, c will be a single candidate term instead of candidate term set, it was a
mistake. In revised copy, the authors have changed it accordingly.
(3) there are numerous grammatically incorrect sentences. They need to be corrected.
Justification: In the revised copy, the authors have made an effort to refine all grammatically
incorrect sentences.

CIN First Revision Comments

Uploaded by

Copyright:

Available Formats

You might also like

CIN First Revision Comments

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CIN First Revision Comments

Uploaded by

Copyright:

Available Formats

CIN/568197.

4. What about exploring concept level/commonsense level query expansion. Microsoft

Subject Appropriateness of the Manuscript

Based on proposed semantic similarity approach:

𝑺𝒊𝒎𝒍𝒄𝒉 𝑀𝑖𝑟𝑟𝑜𝑟− 𝑀𝑎𝑔𝑖𝑐𝑖𝑎𝑛 = 𝟎. 𝟕𝟖

(E) in equation (22), should c be a single candidate term instead of a set?

You might also like