Professional Documents
Culture Documents
Query Subtopic Mining Based On Cluster Ranking: Supervisor Presenter
Query Subtopic Mining Based On Cluster Ranking: Supervisor Presenter
Cluster Ranking
Presenter Supervisor
2
2
Introduction & Motivation
What is Subtopic
Mining??
3
3
Cont.
4
4
Cont.
All results
related
to car !!!!
5
5
Cont.
Problem with Web search
Apple
6
6
Problems Statement
7
7
Objectives
8
8
Related Work
• Santos et al. [1] describe search engine suggestions as subtopic candidates to uncover
query intents.
• Kim and Lee, [2] suggested a method to mine subtopics using simple patterns and
hierarchical structure of subtopic exploiting a set of relevant documents.
• Ren et al. [3] proposed a system to mine subtopics based on a heterogeneous graph and
improve the subtopic quality with the help of Wikipedia concepts by introducing
heterogeneous graph-based soft-clustering to attain an intent indicator for each object
based on the constructed heterogeneous graph.
• Another research work by Hu et al. [4] in which they identified the intents of the input
query by mapping the query into the Wikipedia representation space.
• Zheng et al. [5] integrated the information from both structured and unstructured data to
extract high-quality subtopics.
• M. Shajalal, M. Z. Ullah, A. N. Chy, and M. Aono [6] suggested a method to apply soft
clustering to the subtopic candidates based on frequent phrases to group subtopics of
similar intents.
9
9
Proposed Method
Query Subtopic
Subtopic Clustering
Candidates
Feature Extraction
10
Subtopic Candidate Generation
pocono record
poconos all inclusive
Bing
poconos family resorts
pocono record
pocono mountain
poconos all inclusive
resorts
poconos family resorts
• Pocono pocono record pocono mountain resorts
Google
pocono raceway pocono raceway
poconos vacation poconos resorts
pocono medical pocono medical center
center pocono mountains
pocono raceway Yahoo poconos vacation
pocono record
pocono medical center
pocono mountains
10
Cont…
Query Subtopic
Subtopic Clustering
Candidates
Feature Extraction
12
11
Feature Extraction
Query Dependent Features
• Average Concept Similarity
• WordNet Path Similarity
• Lexical Similarity
• Query Term Overlap
• Query Synonym Overlap
• Exact Match
• Hit Count
• Point-wise Mutual Information
Query Independent Features
• Selective POS Percentage
• Avg. Term Length
• Reciprocal Rank
• Voting
13
12
Feature Extraction
Query Dependent Features (1/2)
1. Average Concept Similarity (ACS)
The average concept similarity between query 𝑄 and subtopic 𝑆:
𝑄, 𝑆 = 𝑡 𝑖 ∈𝑄 𝑡 𝑗 ∈ 𝑆𝐶𝑜𝑛𝑆𝑖𝑚(𝑡𝑖, 𝑡𝑗 )
𝑓𝐴𝐶𝑆 𝑄 ∗ |𝑆|
where 𝐶𝑜𝑛𝑆𝑖𝑚(𝑡𝑖, 𝑡𝑗 ) denotes similarity between two concepts 𝑡𝑖 and 𝑡𝑗 in a
large conceptual domain.
2 ∗ 𝑑𝑒𝑝𝑡ℎ1
𝐶𝑜𝑛𝑆𝑖𝑚 𝑡 𝑖 , 𝑡𝑗 =
𝑑𝑒𝑝𝑡ℎ2 + 𝑑𝑒𝑝𝑡ℎ3 + 2 ∗ 𝑑𝑒𝑝𝑡ℎ1
14
13
Cont...
Query Dependent Features (2/2)
2. WordNet Path Similarity (WPS)
𝑏 𝑄 𝑊𝑏 𝑆 𝑇
𝑄, 𝑆 =
𝑓𝑊𝑃𝑆 𝑏𝑄 ∗ |𝑏𝑆|
where 𝑏𝑄 and 𝑏𝑆 are two binary vector for query 𝑄 and subtopic 𝑆
𝑏𝑄 = 𝐼(𝑡 ∈ 𝑄)
𝑡∈𝑉
𝑏𝑆 = 𝐼(𝑡 ∈ 𝑆)
𝑡∈𝑉
where
𝑉 is a vector containing the terms of query and subtopic.
𝐼(𝑡 ∈ 𝑄) returns 1 if the argument is true, 0 otherwise.
𝑊 is a symmetric matrix containing all pair concept
similarity of vector 𝑉.
14
Cont...
Query Independent Features
3. Selective POS Percentage (SPP)
𝑡 ∈𝑆 𝐼(𝑃𝑂𝑆(𝑡) ∈ 𝑀)
𝑓𝑀𝑃 𝑆 =
𝑃
|𝑆|
where 𝑃𝑂𝑆(𝑡) returns the part of speech tag of a term 𝑡 and 𝑀 is the set of
selective POS, such as
15
Cont…
Query Subtopic
Subtopic Clustering
Candidates
Feature Extraction
17
16
Subtopic Clustering
Query Subtopics Intent
Pluto pictures
Pluto pictures nasa
Pictures of pluto Picture of pluto
Pluto Latest pictures of
pluto
Pluto reinstated as a planet
Pluto a planet again Pluto planet
Pluto planet
Pork tenderloin cooking
instructions Pork tenderloin Pork tenderloin
cooking time cooking
Pork tenderloin How to cook pork tenderloin
Baked pork tenderloin recipe
Pork tenderloin recipes crock pot
Pork tenderloin recipes oven Pork tenderloin
Pork tenderloin recipes recipe
18
17
Cont…
Query Subtopic
Subtopic Clustering
Candidates
Feature Extraction
Ranked
Subtopic Ranking
subtopics
19
18
Subtopic Ranking
19
Dataset
21
20
Experimental Setup
Table 1: Configuration of different Runs
22
21
Experimental Result
Top 10 ranked list
23
22
Feature Importance
Random Forest:
42
24
23
Conclusion and Future Direction
•Conclusion
Introduced a cluster ranking based subtopic mining method
Proposed three features
Average Concept Similarity
WordNet Path similarity
Selective POS Percentage
•Future Directions
Design subtopic pattern
Explore more features
Search result diversification using mined subtopics
24
Future Plan
Ranked Clusters
Feature Extraction
25
References
• [1] R. L. Santos, C. Macdonald, and I. Ounis, “Exploiting query reformulations for web
search result diversification,” in Proceedings of the 19th international conference on World
wide web. ACM, 2010, pp. 881–890.
• [2] S. J. Kim, J. Shin, and J. H. Lee, “Subtopic mining based on three-level hierarchical
search intentions,” in Advances in Information Retrieval. Springer, 2016, pp. 741–747.
• [3] X. Ren, Y. Wang, X. Yu, J. Yan, Z. Chen, and J. Han, “Heterogeneous graph-based intent
learning with queries, web pages and wikipedia concepts,” in Proceedings of the 7th ACM
international conference on Web search and data mining. ACM, 2014, pp. 23–32.
• [4] J. Hu, G. Wang, F. Lochovsky, J. T. Sun, and Z. Chen, “Understanding user’s query
intent with wikipedia,” in Proceedings of the 18th international conference on World wide
web. ACM, 2009, pp. 471–480.
• [5] W. Zheng, H. Fang, C. Yao, and M. Wang, “Leveraging integrated information to extract
query subtopics for search result diversification,” Information retrieval, vol. 17, no. 1, pp.
52–73, 2014
• [6] M. Shajalal, M. Z. Ullah, A. N. Chy, and M. Aono, “Query subtopic diversification based
on cluster ranking and semantic features,” 2016 international conference on advanced
informatics: Concepts, Theory And Application (ICAICTA). IEEE 2016, pp. 1-6.
27
26
28