This document describes 4 algorithms:
1. Bipartite graph construction which creates a graph from click-through data and extracted concepts.
2. Agglomerative clustering which clusters the bipartite graph by merging queries and concepts with the highest similarity scores.
3. Similarity clustering which merges concepts and queries with higher similarity until a threshold is reached.
4. Community clustering which further clusters the results from previous algorithms by merging the most similar query pairs.
This document describes 4 algorithms:
1. Bipartite graph construction which creates a graph from click-through data and extracted concepts.
2. Agglomerative clustering which clusters the bipartite graph by merging queries and concepts with the highest similarity scores.
3. Similarity clustering which merges concepts and queries with higher similarity until a threshold is reached.
4. Community clustering which further clusters the results from previous algorithms by merging the most similar query pairs.
This document describes 4 algorithms:
1. Bipartite graph construction which creates a graph from click-through data and extracted concepts.
2. Agglomerative clustering which clusters the bipartite graph by merging queries and concepts with the highest similarity scores.
3. Similarity clustering which merges concepts and queries with higher similarity until a threshold is reached.
4. Community clustering which further clusters the results from previous algorithms by merging the most similar query pairs.
This section deals with the algorithms used in each module for implementation.
Algorithm 1: Bipartite graph construction
Input: Click through data CT, Extracted Concepts E. Output: Bipartite Graph G. 1: Obtain the set of unique query, Q from CT 2: Obtain the set of unique concepts, C from extracted concepts 3: Select each node {Graph} g as QՈC 4: If a web snippet is obtained as s, q € Q; edge = (q; c)
Algorithm 2: Agglomerative clustering
Input: Query-concept bi-partite graph. Output: Clustered bi-partite graph. 1: Obtain similarity scores of each possible queries in G, by using noise tolerant function 2: Merge queries of highest similarity scores 3: Obtain similarity scores of each possible concepts C, by using noise tolerant function 4: Merge queries of highest similarity scores 5: repeat until termination reached
Algorithm 3 :Similarity clustering
Input: Similar inputs to that of similarity checking. Output: Basic similarity tested concepts and keywords. 1: Initialize with concepts to be checked its similarity 2: Merge concepts having higher similarity 4: Merge queries having higher similarity 5: repeat until threshold obtained as zero
Algorithm 4: Community clustering
Input: Clustered concepts and queries obtained. Output: Final and maximum clustered output. 1: Obtain the similarity scores in G for all possible pairs of queries using the noise-tolerant similarity function given 2: Merge the pair of most similar queries qi ; qj that contains the same queries from different users 3: Unless termination is reached, repeat