Professional Documents
Culture Documents
Solr Cluster Installation Tool "Anuenue"
Solr Cluster Installation Tool "Anuenue"
Solr Cluster Installation Tool "Anuenue"
mixi?
One of the largest social networking service in Japan. Many services to promote communication among users. Blog, news, game platform etc Most of the services come with search 15M monthly active users
Todays Topics
Anuenue Handy configuration of search clusters Commands for search clusters Did-You-Mean facilities for Japanese queries Common problem in Did-You-Mean implementation Mining a Japanese Did-You-Mean dictionary from query log data
Role: master
Index input data. NOTE: Anuenue provides a command to distribute the input data into master instances (build Solr shard indexes) .
Master-1
Master-2
Master-3
Role: slave
Has three functions Copy (replicate) index from master Accept queries from mergers and then search it own index Return the results to merger instance
Merger-1 Submit queries Slave-1 Slave-2 Replicate index Master-1 Master-2 Index input data
Input Data
9
Role: merger
Forwards queries from clients to slaves. Note: clients need not to know the slave instances (merger adds shard parameter with slave instances) Merge the results from all the slave instances and returned the merged results.
Client-1 Client-2 Submit queries Merger Forwards queries Slave-1 Slave-2
10
12
Configuration example
Case: there is one merger instance in machine, aa (port 7000) <mergers> <merger> <host>aa</host> <port>7000</port> </merger> </mergers>
13
14
16
Client1
Client2
Input Data
17
Merger1
Merger2
Merger3
Slave1
Slave2
Slave3
Slave4
Slave5
Slave6
Master1
Master2
Master3
Master4
Master5
Master6
Input Data
18
Todays Topics
Anuenue Handy cluster configuration of search clusters Commands for search clusters Did-You-Mean facilities for Japanese queries Common problem in Did-You-Mean implementation Mining a Japanese Did-You-Mean dictionary from query log data
22
23
24
Common implementation
Many search engines (including Solr) apply distance measures such as Edit Distance [Levenshtein, 1965] Edit Distance: measure of distance between two sequences. Simply speaking, when two sequences have more common characters, the distance is smaller.
E.g., like 1 likes (small distance) like 1 foobar (large distance)
25
26
27
This step cause a spelling mistake, too large distance to correct spelling
28
29
30
31
32
Problem
How we can create the dictionary? We can make use of a query log mining tool Oluolu.
33
Oluolu
Creates a spelling correction dictionary from query log Extracts pairs of queries (query with spelling mistakes, query with correct spelling) Support the Japanese spelling mistakes (from version 0.2) runs on the Hadoop framework Project URL: http://code.google.com/p/oluolu/
34
34443
Java
438904
Python
8975
36
Queries in the same session: a set of queries submit by the 438904 same user within small time range. 8975 Extracted pairs can be misspelled query and correct query.
37
38
39
40
Example: step 2
Given a pair of queries: (, ) 1. Convert them into readings readings are the same, sumitomofudousan. 3. Compute the distance with the readings Distance is zero Extracted as a element of Did-You-Mean dictionary
41
42
Preliminary experiments
Experimental settings Input data: log file from a mixi service (community search). 5 GB data Extracted dictionary number of elements is over 100.000 succeeded to extract the query pairs with large edit distance. (, ) (, )
Current status
Finished functional tests and stress tests. Now replacing an in-house search engine in a small search service with Anuenue. In next phase, we will apply Anuenue to the search service with large data and high QPS.
44
Future work
Integrate SolrCloud and Zookeeper Support failover, and rebalance the index Kuromoji, a new OSS Japanese tokenizer
45
Summary
Introduction of Anuenue Described a Did-You-Mean facility for Japanese query
46
47