Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

GOVERNMENT USERS

Conference

Navigating the Human Terrain College Park, MD, May 20-21, 2008

Building Applications with

Rosette Name Indexer & Rosette Name Translator


Benson Margulies CTO Basis Technology

Introducing RNI and RNT


Rosette Name Indexer Resolving Names
Stores names 'on disk' Queries on data or meta-data

Rosette Name Translator Translating Names


Translation vs. Transliteration Cologne, Kln George Bush, Jeorge Buzh There isn't always only one right answer

Coding for RNI and RNT


Common concepts and data structures RNI Application Programming RNT Application Programming

A Note On Programming Languages


RNI and RNT aimed at Java applications RNT has subset API in C++ Both have web services This talk looks at Java

What's in a Name?
Class:
com.basistech.rnm.Name

Properties listed here.

Data the name itself Language Script Entity Type Unique ID Entity ID Arbitrary String Transliterations

The minimum ...


What do you need to store, translate, or resolve names?
Data the text of the name (e.g. Albert Schweitzer) Language ISO639 code from com.basistech.util.LanguageCode enum. Script ISO15924 code from com.basistech.util.ISO15924 util.

This is all you must have.

The optional fields


Connections to other things
Unique ID for your own cross-references Entity ID if you want to group multiple names of the same entity.

Conventional Translation: 'transliterations' Entity Type (person, place, etc.) Plus, whatever you want extra
Consider 'serializable'

Text Domains
Text Domain describes a name Three fields:
Language Script Scheme ...

Example:
ar/Arab/Native ar/Latn/Folk

For translation, 'pair' specifies input domain and output domain.


8

What's a Scheme?
Schemes identify standards for translating or transliterating names: e.g. IC, BGN Schemes name other representations of names:
FOLK an informal transliteration NATIVE the original orthgraphy

Using the Rosette Name Index


Creating a New Index
com.basistech.rnm.index.StandardNameIndex.create Give it a pathname and options ... It gives you back an INameIndex Support for in-memory indices.

Opening an Existing Index


.open instead of create

Note that an index is, physically, a directory.

10

Storing Names
Create a Name object Add it to the INameIndex Batching and concurrency
By default, additions are only seen by 'adder'

11

Querying for Names


Filling up a query object
com.basistech.rnm.index.NameIndexQuery Fields for the data and various metadata Flags to enable the fields Very simple query model ... if you need a SQL database, you should use one.

Retrieving results
INameIndex.lookup Then obtain the iterator Data versus metadata queries
12

Using the Rosette Name Translator


A Translator object Translates:
com.basistech.rnt.ITranslator

'Basic' Translators implement one domain pair


e.g. ar/Arab/Native -> ar/Latn/IC com.basistech.rnt.BasicTranslator

Basic Translators come from a Factory


com.basistech.rnt.BasicTranslatorFactory

13

RNT Rule-based Translator


com.basistech.rnt.RuleSetTranslator Chooses a translator based on input domain and the entity type. Example: People with IC, places with BGN. Spring is convenient for configuration.

14

Translators take Options


Different translators accept different options Options are object from:
com.basistech.rnt.options

Example: option controls whether to deliver enhanced version of original input, e.g. adding Harakat to Arabic.

15

Translation Process
'translate' takes an ITranslatable.
com.basistech.rnm.Name implements.

'translate' returns a List< com.basistech.rnt.TranslationResult > Each result has string, confidence, and additional information
e.g. improved spelling of input

16

Conclusion
Bad news: you will still have to read the documentation and look at the examples. Good news: you should have an overall picture of the main classes and interfaces that you will use to integrate RNI and RNT into applications. And don't forget:
productsupport@basistech.com

17

You might also like