Professional Documents
Culture Documents
CPython Embedded in Solr - Search Solution For Python Lovers With The Speed of Native Java
CPython Embedded in Solr - Search Solution For Python Lovers With The Speed of Native Java
CPython Embedded in Solr - Search Solution For Python Lovers With The Speed of Native Java
- Try it out!
- https://github.com/romanchyla/montysolr
2
Thursday, May 26, 2011
Outline
Context - The Challenge - Key components
- Available technologies - Our approach - Problems solved
- Evaluation - Wrap-up
3
Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
CERN
- European Organization for Nuclear Research
- Switzerland, Geneva
- The largest laboratory for High Energy Physics - Home to the Large Hadron Collider - 40-50K HEP scientists worldwide
4
Thursday, May 26, 2011
SPIRES
- Stanford Linear Accelerator Center - SLAC - High-Energy Physics Literature Database - Started December 1991
- The rst web outside Europe/CERN - The rst database on web
5
Thursday, May 26, 2011
SPIRES
- Stanford Linear Accelerator Center - SLAC - High-Energy Physics Literature Database - Started December 1991
- The rst web outside Europe/CERN - The rst database on web
5
Thursday, May 26, 2011
6
Thursday, May 26, 2011
7
Thursday, May 26, 2011
Invenio
- Integrated digital library software behind INSPIRE - Used by very large institutional repositories
- http://repositories.webometrics.info/toprep_inst.asp
Outline
- Context The Challenge - Key components
- Available technologies - Our approach - Problems solved
- Evaluation - Wrap-up
9
Thursday, May 26, 2011
The Challenge
- HEP scientic community
- Searches metadata oriented
- However fulltexts are changing the situation - And we want to provide even better service
- Bigger volumes of data - NLP processing - Semantic search
10
Thursday, May 26, 2011
The Challenge
Invenio
11
Thursday, May 26, 2011
The Challenge
Query: supersymmetry AND author:ellis
Invenio
11
Thursday, May 26, 2011
The Challenge
Query: supersymmetry AND author:ellis
Invenio
fulltext:supersymmetry
11
Thursday, May 26, 2011
The Challenge
Query: supersymmetry AND author:ellis
Invenio
11
Thursday, May 26, 2011
The Challenge
Query: supersymmetry AND author:ellis
Invenio
11
Thursday, May 26, 2011
The Challenge
Query: supersymmetry AND author:ellis
Invenio
11
Thursday, May 26, 2011
The Challenge
Query: supersymmetry AND author:ellis
Invenio
11
Thursday, May 26, 2011
The Challenge
Query: supersymmetry AND author:ellis
Invenio
11
Thursday, May 26, 2011
The Challenge
Query: supersymmetry AND author:ellis
Invenio
The Challenge
Query: supersymmetry AND author:ellis
Invenio
The Challenge
3. push IDs ? Query: supersymmetry AND author:ellis (eg._faceting)
Invenio
Outline
- Context - The Challenge Key components
- Available technologies - Our approach - Evaluation
- Demonstration - Wrap-up
13
Thursday, May 26, 2011
14
Thursday, May 26, 2011
14
Thursday, May 26, 2011
14
Thursday, May 26, 2011
14
Thursday, May 26, 2011
14
Thursday, May 26, 2011
Jython?
- Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded
16
Thursday, May 26, 2011
Jython?
- Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded
17
Thursday, May 26, 2011
Jython?
- Implementation of Python in 100% Java - Both Java and Python code - Truly multithreaded
17
Thursday, May 26, 2011
18
Thursday, May 26, 2011
19
Thursday, May 26, 2011
JCC
- Embeds JVM in Python - C++ code generator - C++ object interface wraps a Java library - C++ wrappers conform to Python's C type system - result: complete Python extension module
20
Thursday, May 26, 2011
JCC
21
Thursday, May 26, 2011
JCC
21
Thursday, May 26, 2011
JCC
21
Thursday, May 26, 2011
JCC
JEPP ?
...
22
Invenio Solr
JCC
23
Thursday, May 26, 2011
Devil is in details...
24
Thursday, May 26, 2011
25
Thursday, May 26, 2011
27
Thursday, May 26, 2011
- We write empty classes in Java ... - ... and implement them in Python
28
JCC
29
Thursday, May 26, 2011
30
Thursday, May 26, 2011
MontySolr extension
- JCC has great potential, but also added complexity... - So the MontySolr project was born
- Modules must be built in shared mode - JCC dynamic library loaded and started from the main thread - Simple mechanism of the Python bridge and message - Congurable handlers on the Python side - Secured dereferencing of the native objects - Threading on the Java side - Multiprocessing on the Python side - Easy ant targets (compilation) ...
31
Thursday, May 26, 2011
32
33
Thursday, May 26, 2011
- Python side
- Python interpreter (32/64 bit) - 4 Python modules (jcc, solr, lucene, montysolr)
JCC
35
Thursday, May 26, 2011
Example
Solr
MyCustom Handler
36
Thursday, May 26, 2011
Example
refersto:author:ellis Solr
MyCustom Handler
37
Thursday, May 26, 2011
38
Thursday, May 26, 2011
MyCustom Handler
Python Bridge
39
Thursday, May 26, 2011
MyCustom Handler
40
Thursday, May 26, 2011
# search time - called from Java def perform_search(message): query = message.getParam(query) hits = call_real_search(query) # cast Python list into Java array message.setResults(JArray_ints(hits))
41
Thursday, May 26, 2011
Example
refersto:author:ellis Solr Invenio Invenio Invenio Invenio
MyCustom Handler
42
Thursday, May 26, 2011
43
Thursday, May 26, 2011
Invenio
JCC
44
Thursday, May 26, 2011
Outline
- Context - The Challenge - Key components
- Available technologies - Our approach - Problems solved
Evaluation - Wrap-up
45
Thursday, May 26, 2011
46
Thursday, May 26, 2011
47
Thursday, May 26, 2011
48
Thursday, May 26, 2011
Robust?
- Extensive siege tests show very good performance and stability under high load
- 100-200 users, complex searches - 50 concurrent users, citation analysis - JCC incurs small overhead
49
Thursday, May 26, 2011
Easy to develop/maintain?
- Added complexity
- Java in the toolbox - Need to compile C++ extensions - Python/OS version dependencies
50
Thursday, May 26, 2011
Outline
- Context - The Challenge - Key components
- Available technologies - Our approach - Problems solved
- Evaluation Wrap-up
51
Thursday, May 26, 2011
Wrap-up
- Our challenge was to connect two different languages/systems - And we wanted to get the best of the two...
- So we had to plug Python into Solr - And now our Solr knows citation analysis!
52
Questions?
- MontySolr
- https://github.com/romanchyla/montysolr
- Roman Chyla
Fellow, CERN Scientic Information Service roman.chyla@cern.ch @rchyla https://svnweb.cern.ch/trac/rcarepo
Additional information
54
Thursday, May 26, 2011
Links
- Invenio platform
- http://invenio-software.org/
55
56
Thursday, May 26, 2011
57
Thursday, May 26, 2011
57
Thursday, May 26, 2011
58
Thursday, May 26, 2011
59
Thursday, May 26, 2011