Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 17

Apache Solr I can haz Search!

Barcamp 5, Chennai

Ashish Yadav (ashish_0x90)

Agenda

Overview of Apache Solr


Why Solr?

Installing Apache Solr


Getting Solr configuration right. Solr query basics and not so basic stuff. Scaling Solr Some tips on Solr Caching

Overview

Apache Solr is a standalone full-text search server with Apache Lucene at the backend.

Apache Lucene is a high-performance, fullfeatured text search engine library written entirely in Java.

In brief Apache Solr exposes Lucene's JAVA API as REST like API's which can be called over HTTP from any programming language/platform.

Features

Full Text Search


Faceted navigation

More items like this(Recommendation)/ Related searches


Spell Suggest/Auto-Complete Custom document ranking/ordering Snippet generation/highlighting

And a lot More....

So, why would I need solr??

Want Greater control over your website search.


Caching, Replication, Distributed search.

Reallly fast Indexing/Searching, Indexes can be merged/optimized (Index compaction).


Great admin interface can be used over HTTP. Awesome community support too. Support for integration with various other products like drupal CMS, etc.

Products using Solr

E-commerce sites, CMS, Blog sites.


Heavily used by LinkedIn, Twitter, Cnet, Netflix, Digg. Many of them contribute back, like LinkedIN SNA(Search, Network, and Analytics team)

Installation
Minimum Requirements.
Directory for storing index files.

Directory for storing configuration files.


Solr_Home having other dependencies A Servlet container(tomcat, jetty) with appropriate configuration.

Configuring Solr

Schema.xml Contains all of the details about document structure, index-time and query-time processing.
Solrconfig.xml - Contains most of the parameters for configuring Solr itself.

Querying Solr: The basics

Plain text search Expanding search to more fields :


Add facets

q = text:"I love android"

title:android & type:review & price:[* To 500]

facet.field=product & facet.field=rating

Querying Solr: The basics

Add facets for range queries

facet.query=price:[* TO 100]&facet.query=price:[100 TO 200]&facet.query=price:[500 TO *]

Ordering results Limiting results Paginating on results

sort = score desc, price asc

rows=15

start=25 & rows=10

Querying Solr - Not so basics stuff


Advanced Query operators:

fq : FilterQuery , Example: fq = type:review & price:[* TO 500]


fl : Restrict fields to be returned with the resultset.

Example: fl=id,title,text

Querying Solr - Not so basics stuff

hl : Highlighting matches in snippet, Snippet generation etc. Custom Field boosting

Example query : hl=true&hl.fl=title,text

Example: q=product:samsung&text:awesome & defType=dismax & qf=product^20.0+text^0.3

debug = true

Solr Search Custom handlers

Request Handlers Response Writers

DataImportHandler, DisMaxHandler

json,xml,csv format writers

External Search Components

SpellCheckComponent : More Like this - (Term Suggest, Similar items etc.) Clustering component TermVector Component

Uses solr indexes, Custom dictionaries etc.

Returns advanced information about Query terms, offset, positions

Scaling Solr (I feel the Need for Speed >>>> )

Distributed Search a.k.a Sharding. OR

Create Separate indexes(Rsync/Scp)


Can run Solr index Replication daemon.

Optimization/Autocommit for the indexes.

Solr Caching

Build your queries wisely.

External Caching : Memcached, etc.


Internal Caching 1) FilterCache: Used by facetQueries(fq), sometimes for faceting too.

Different types of cache:

2) QueryResultCache : Used for results returned by generic queries

Links and resources


http://wiki.apache.org/solr/ http://www.lucidimagination.com/developer/Artic les http://khaidoan.wikidot.com/solr

http://42bits.wordpress.com

You might also like