Download as rtf, pdf, or txt
Download as rtf, pdf, or txt
You are on page 1of 4

Arabic Intelligent Retrieval System For Legal Database AIRS LD

El Achkar Mona Moacja@ul.edu.lb

NABHAN Philipe pnabhan@ul.edu.lb

Legal Informatics Center Lebanese University Tabara bldg Sami Solh st. P. .! ""#$%&'# !eirut ( Lebanon
Abstract
)lthough *e admit that high precision information retrieval is an e+tremely hard problem, *e believe that *e had already achieved a brea-through. The most helpful element is that *e do completely consider the particularities of legal language in one hand, and that *e rely on human analysis of each document based on predefined linguistic tools and language patterns on the other hand. This means *e achieved normali.ing and homogeni.ing of vocabulary as *ell as precising the analy.e degree and enhancing the human performance. The ob/ect *e *ill consider in this paper is )I0S( L1 2)rabic information retrieval system for legal /urisprudential data(base3 *e are developing at the legal informatics center at the Lebanese University. This system is intended to overcome the problems arising from the applications4 5ualitative re5uirements in legal information retrieval such as6 very user(friendly, reliable and sufficiently intelligent to meet the aims of the users. ur data(base gathers la*s, decrees, doctrine and /urisprudentially documents. Therefore our system is supposed to deal *ith the main legal te+t problem *hich derives from the fact that this te+t li-e all other te+ts *ritten in natural language is not designed to be used by computers. ur *or- rely heavily on ontology defined as an 2e+plicit specification of a conceptuali.ation6 the ob/ects concepts, and other entities that are assumed to e+ist in some area of interest and the relationships that hold among them3

1 - Introd ction
I0 is difficult because it must overcomes the ambiguities inherent in a language. Inde+ing specific 7ey*ords 8li-e *e used to use and still have use on some data bases9 have revealed to be producing poor results because individual terms don4t al*ays line up *ith concepts. In 7ey*ords searching the problem of information retrieval via -ey*ords centers around t*o issues6 synonymy and homonymy as *ell as the poor representation of concepts and discourse. Thus *e had to build tools that help filter out e+traneous information and return only relevant ones. Such level of precision is very hard to accomplish unless systems could interpret or understand meanings of used *ords in 5ueries, and unless they allo* the use of many *ords to precise the information loo-ed for.

are a net of relations among concepts. They allo* an in( depth te+t analysis during inde+ing as *ell as for sophisticated 5uery enhancement and result preparations to supply high(5uality information retrieval. ( ) le+icon of terms. ( ) le+icon of pre(defined concepts e+tracted of the main concepts present in the original te+t 8*hich is -ept in the data(base as an image lin-ed to the abstract9, and represented by the second one 8the abstract9 ( ) thesaurus of concepts. ( )n information e+traction system. ( )n information retrieval system.

$ - %rocessing legal lang age


;e are dealing *ith a special language *hich is the legal one. )ccording to A.;robles*es-i, legal language comes from natural language and adds to it speciali.ed *ords and specific meanings corresponding to the legal nature of that discourse.the difference bet*een natural language and legal language is semantic not syntactic. It depends on the *ords as *ell as their on specific meanings.. So not only should the system ta-e into consideration both the te+tual aspects 8morphology, synta+9 but also the specific discursivity of the material as *ell as the users< e+pertise, be it the /udge, the la*yer, the legislator, the administrative or any other searcher. )lso, should be ta-en into consideration the symbolic meaning in *ords and the language comple+ity due to its style. Ba-ing computers understand the natural language in *hich legal documents and 5ueries are *ritten re5uires a very in(depth linguistic processing that too- into consideration this reality. Thus, *e developed many linguistics tools to help represent the meanings of *ords as *ell as of concepts, so *e can achieve disambiguation of *ord sense ,e+pansions as *ell as e+tracting implicit information and describing te+t discourse. Bost of the natural language processing steps performed *ere done *ithin the frame*or- of the P)LL system. P)LL allo*ed us to disambiguate *ords based on their relation to other *ords and on their semantical relations. It helps also the building of the special le+icon and thesaurus6 "( to disambiguate meanings so that *ords li-e 8!I))9C 8environnement9 are distinguished in the meaning of 8!I)) B UD)SI!) E !I)) 7)D UDI)9 E 8T)L) UT: )L !I))9. ?( to achieve morphological analysis so that roots of the *ords are presented as to-ens rather than *ords themselves. For e+ample, 8D)7@L E D)7IL UD9C 8transporteur, transporteurs9 are presented as to-en 8D)7@L9C 8transporteur9. &( to use thesaurus e+pansion in order to ma-e *ords li-e 8 USS UL B U:)7)B)T9C 8procedures civiles9 e+pand to include 8T U0 7 B U0)A))T9C 8voies de recours9

! - Aims o" the system


Considering the follo*ing facts6 ( 0eluctance of users to use codified language (The limitation of the !oolean search in inverted files *ith distance operators *hich deals *ith the syntactical representation of terms *ithout regard to the semantic meaning. ( )mbiguities generated by syntactic representation search. ( Imprecision of results search due to single *ords inde+ing. ( :igh structure of legal te+t. ;e decided to create an integrated legal information systems. This term stands for a system that can ma-e available to those *or-ing in the field of la*, *hether legislator or la* professional, not only the legislation but also the other sources of la* they re5uire for carrying on their *or-, case la* and in special cases, legal authorities as *ell. In this conte+t the system<s aim is t*ofold6 first to build an efficient communication oriented to*ards /urist, and second, to help enhance information retrieval by boosting both recall and precision at the same time= Thus to improve relation *ith our 1ata bases and the documents< relevancy simultaneously, through a familiar user interface. The follo*ings are the characteristics retrieval functions of the proposed system6 >( )n )rabic friendly(user interface. "( Language oriented inde+ing and 5uery processing techni5ues. ?( @nhanced phrases processing based on linguistic processing. &( )cceptance of semi(natural language sentences as 5uery.

# - Str ct re o" the system


The basis of the system is a set of synonyms *hich contains all the terms *hich can represent a concept and an analysis of the position of *ords in relation to other te+tual elements. The system is composed of many elements6 ( ) corpus of )nalysis Patterns built upon concepts and relations among them. These concepts are built at the first level through semantic relations bet*een terms.Patterns

& - Str ct ring 'he in( t doc ment


Precision of the information retrieval system is achieved through the architecture and the structure of the input document *e build. ur system focuses on the input document structure build upon the analy.ed document and representing the search levels6 the phrase, the paragraph and the *hole abstract.

The architecture is defined through the representation of solutions given to problems and cases, structure is done in the light of the *ords< disambiguation principles, the sentences4 ade5uate environment that represent the discourse, and their semantical relations and role in the document. The architecture helps creating nets out of legal and factual concepts by lin-ing them in a *ay that indicates ho* and *hy the case *as resolved in a particular meaning *hich covers the aspect of legal reasoning. )rchitecture focuses on the line of reasoning of *hich the document constitutes the conclusion. The meaning of concepts is driven through their ran- and their closeness to each other. Ta-ing into consideration the original legal te+t structure, the legal language characteristics and the legal reasoning, structure of the input document convey e+actly the content of the original ne fully and efficiently. )t the first level, material is analy.ed and classified into concepts or themes for the ease of retrieval. Themes and concepts are represented by 2phrases3. Phrases are the basic units of constructing. They describe the e+plicit and implicit meaning of the original document. Their relations and order in the abstract describe not only the argumentical reasoning of the /udge but also the implicit meaning that ma-es logical relation bet*een phrases themselves and the reasoning4s environment *hich ma-e the particularity of each concept, discourse and document.

In order to implement those assumptions *e determined sentence as a search level for more effective matching bet*een 5uery terms and *ords in the documents. ;e also used the thesaurus to e+pand 5uery terms to their hyponyms and synonyms as *ell as to concepts.

)*!* Searching at the (aragra(h level


;e designed the location model as an alternative system to the semantical net of relations, and based it on the follo*ing assumptions. a. Presence of any given t*o concepts 8represented by t*o different phrases9 from the 5uery in the same paragraph enhances the relevancy of the document. b. 1iscourse information is very helpful in determining relevance of a document. c. The more concepts are located at the same paragraph, the more relevant the document is.

)*#* Searching at the abstract level


1espite the fact that the abstract 8the input structured document9 is *ell structured defined and composed of paragraphs *hile these latter are composed of concepts, searching by single *ords at this level is far from being accurate and effective.

+* Dis(laying the res lts


The documents selected from the ran-ed list of results can be displayed in various units, such as hit passage, paragraph 8including the hit passage9, abstract 8including the hit passage9. The salient sentences or *ords are highlighted so that the user can s-im the te+t 5uic-ly too confirm the conte+t around the salient part. Through this operation, the user can verify the conte+t of the hit passages in the abstract reported *hich include the hit passages in order to e+amine the validity and reliability of study.

) - Search "eat res


The )I0S(L1 uses the techni5ues of full te+t search. it offers a number of advanced features eliminating some of the )rabic *ords syntactical problems. It incorporates several of te+t analysis functions to provide facilities, in addition to the ability to inde+ and search using )rabic legal language, use supporting thesaurus and process 5ueries *ith a combination of natural language and !oolean parts as *ell as phonetic searches. The system uses the te+t structure, *hich ma-es it more effective than ones that do not differentiate the role or function each concept plays in the te+t. The te+t structure units being sentences, paragraphs and abstract. They represent the search units that can be selected from a list bo+.

,* -oncl sion
)I0SGL1 meets our need to a solution that helps enhancing conditions of accessing the pertinent information and providing us *ith a high precision accuracy information retrieval. This couldn4t have been achieved *ithout processing natural legal language as *e did. The ne+t step shall be as *e hope a machine learning program that use the concepts4 patterns in the manually inde+ed documents to teach the soft*are *hat attributes ma-e up desired content.

)*1* Searching at the sentence level


0epresentation of legal -no*ledge sho*s that each document contains many concepts, *hile each cluster of terms represents concepts. It sho*s also that some *ords are constituent parts of larger terms or may have a role relationship *ith each other. So single *ords are source of ambiguity and that4s *hy ;e designed the sentence as an environment for the *ord follo*ing assumptions that6 a. Phrase is the most ade5uate frame to disambiguate a *ord and to give it the meaning intended in the search. b. ;ord pairs that appears at the same sentence in the same order as in the 5uery 8for*ard pairs9, are more li-ely to provide highly precision in retrieving information than the ones that appear in the inversed order 8bac-*ard pairs9.

.* Re"erences
)i-enhead6 ) 1iscourse on La* E )rtificial Intelligence, AILT %,". !elair Claude. 8?>>"9. Hers L4interactivitI dans la recherchI documentaire /uridi5ue informatisIe, Collo5ue Le dIfit du traitement de l<information Auridi5ue au ?"eme siJcle, !eirut. !ories Serge. 8?>>"9. L4ordinateur, le /uge et le /urisconsulte, Collo5ue Le dIfit du traitement de l<information Auridi5ue au ?"eme siJcle, !eirut.

@l !ustani @mad. 8"''K9. )rabic Information Systems6 Problems E Solutions, Proceedings of the First )l( Shaam International Conference on Information Technology, 1amascus. @l )ch-ar B., 0ammal B. 8?>>"9 . Patterning )rabic Legal Language6 )nalysis4 structures., ;or-shop on 2Soft*are for the )rabic Language3, Lebanese )merican University, !eirut. @l )ch-ar B., 0ammal B., Dabhan P. 8?>>"9 . )ccess to Legal Information ( Collo5ue Le dIfit du traitement de l<information Auridi5ue au ?"eme siJcle, !eirut. Lruber, tom. 8"'''9 2*hat is an ontology3. In 7no*ledge on the *eb seminar, stanford University. Lold 1., Suss-ing 0. @+pert system in la*6 ) Aurisprudential )pproach to )rtificial Intelligence and legal 0easoning, The Bodern La* 0evie*, vol. K'. A.P.1ic-. 8"''"9. 0epresentation of Legal Te+t for Conceptual 0etrieval, Proceedings of the Third International Conference on )rtificial Intelligence and La*, )CB Press. 7ando.D, 8"''M9 ( te+t level structure of research articles and its implication for te+t based information processing sustems3. In Proceedings of the "'th )nnual !CS(I0SL collo5uium on I0 research, )berdeen, Scotland, U7.p.#N(N". 7evin Curran, Lee :iggins. 8?>>>9. ) Legal 0etrieval Information System, The Aournal of Information, La* and Technology. Batthi/sson. 8"''%9. )n Intelligent Interface for Legal 1atabases, Proceedings of %th International conference n )rtificial Intelligence and La* , 7lu*er. P. ;ahlgren. 8"''?9. ) Leneral Theory of )rtificial Intelligence and La*, Proceedings of AU0IO <'K, 7onin-li/-e Hermande, Lelystad, DL. 0ammalL B., @l )ch-ar B. 8?>>"9. Computer research assisted Lebanese 1ocument, ;or-shop on 2Soft*are for the )rabic Language3, Lebanese )merican University, !eirut. Sch*eighofer @.8"'''9. The 0evolution in Legal Information 0etrieval or6 The @mpire Stri-es !ac-. In The Aournal of Information, La* and Technology 8AILT9. T.A.B. !ench(Capon. 8"''%9. )rgument in )rtificial Intelligence and La*, Proceedings of AU0IO <'%, 7onin-li/-e Hermande, Lelystad, DL. ;roble*s-i, Aersy.8"'NN9 2les langages /uridi5ues6 une typologie.P 1roit et Societe. Do 6 N

You might also like