Ai System To Assist Legal Processes Using Natural Language Processing

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

AI SYSTEM TO ASSIST LEGAL PROCESSES

USING NATURAL LANGUAGE PROCESSING


CHAPTER 1
INTRODUCTION

1.1 ARTIFICIAL INTELLIGENCE

Artificial intelligence is the simulation of human intelligence by machines.


Computers can be made to think like humans and imitate their actions. In Artificial
Intelligence, the machine exhibits the characteristics of humans such as learning,
understanding and problem solving. Artificial intelligence breaks down human
intelligence to a form that can be understood by machines. This is then used to make
the machines perform tasks. These tasks may be simple or very complex tasks that
require hours of work. AI has countless applications from self-driving cars to
performing complex surgeries.
Natural language processing (NLP) is a subfield of computer science,
information engineering, and artificial intelligence concerned with the interactions
between computers and human (natural) languages. Computers are made to process
huge volumes of natural language data and perform tasks and derive conclusions
from it.

1.2 SYSTEM OVERVIEW

The objective is to create a legal system that can help people who wish to obtain
quick legal information, right from a lawyer to a common person. Right now, lawyers
have to go through all the laws in our constitution to find the ones that are relevant for
the case. This process can be made much faster by picking out the laws that seem
relevant to the topic at hand and then displaying it to the user. This can help them come
to a decision much quicker. This sort of system can also help other people who are not
well versed in law. They can put in their problems and check whether their case is worth
pursuing or not before deciding to hire a legal consultant.
In addition to this, the system can also help laymen to find who to approach in case
they require legal assistance. In case some of the legal documents (such as contracts)
are not clear, a summarization tool can be provided so that the key points of the given
contract are summarized and presented to the user.

1.3 SCOPE OF THE PROJECT


For this system, we have taken into account some of the key problems that both the
human experts and the ordinary person faces when it comes to finding solutions and
seeking redressal in the legal system. Some of these problems include the effort taken
by lawyers to find the relevant laws, finding the true nature of legal contracts, and not
knowing how to approach the courts for redressal.
The goal of the system is to make the legal system more understandable and
approachable. It also helps the human experts by speeding up the various processes
involved and connecting the right lawyers with the right clients in order to ensure that
people are able to voice out their troubles and seek justice.

1.3.1 Finding relevant laws


Currently lawyers listen to the problems of their clients and then look
through large volumes of legal documents in order to find a solution to the client’s
problem. They have to look through the documents until they can find laws relevant
to the case and then decide which laws to use. This process is tiring and consumes a
lot of time. There is also a chance that some laws that are not regularly used can be
skipped in the process of looking for relevant data. Lawyers also have to find cases
which are similar to the current case at hand and then verify what laws were used to
argue the case.
This process of looking through the laws wastes time that the lawyer can use
preparing their case. The time of the human experts can be directed to much more
useful tasks than manually looking through large volumes of text. Thus, this AI
system comes into play.
The proposed system will take a problem statement from the user. After
analysing the problem, it will filter out the relevant laws in the constitution. After
doing this, it will display them to the user. From this, the lawyers can pick out the
laws that they think will be most useful for them. From this, we can see that the
system will search through the constitution for all the laws that can be applied to the
problem. The lawyer finally has to decide which of the laws will be most suitable for
him to use. The final decision is still left in the hands of the human experts and they
can frame their case however they want. The goal of the system is to reduce the
effort and speed up the time taken to frame the case.
The input to the system will be in the form of natural language (i.e, the issue
as stated by the lawyer’s client). It need not be polished too much or have legal
terms added to it in order to be fed into the system.

1.3.2 Help a person decide whether to pursue a case or not


It is possible for a layman to use the system in order to make a decision
whether to pursue a case or not. Since, the input to the system will be in the form of
natural language and doesn’t need to be edited or have a structure of any sort, it can
also be easily used by ordinary people in addition to the experts (lawyers).
Sometime a person will wish to know whether pursuing a case will have any
value for them or not before attempting to seek justice from the courts. In order to
accurately determine this, it is usually necessary to have some sort of legal
knowledge or an understanding of laws present in the constitution.
Now, since anyone is able to use this system, they can get all relevant laws
just by stating their case to the system. Since the system pulls up all the relevant
laws, the user can read them and get an idea whether the judgement will be in his
favour or not. After that he can opt to approach the human experts or drop the case
if he is in the wrong.

1.3.3 Recommend lawyers and legal consultants


Choosing the right lawyers and legal consultants for a case is just as
important as the facts of the case itself. Many people do not know who to approach
if they want to file a case and do not know any lawyers. Also, they might pick a
lawyer who is actually not well versed in cases which are similar to the one they are
going to file.
In the case of laymen, the system can also recommend lawyers based on the
nature of their case. For example, if the case looks to be a land dispute the system
can recommend lawyers that are well-versed in civil cases especially land disputes.
Additionally, filters can be added based on the factors such as location, price-range,
etc. This will help the people approach the right lawyer for the right case.
The aim of this is to make the legal system much more approachable for even
an ordinary person. If people are given clear guidelines on legal practices, a lot more
people will be willing to come forward with their issues and seek justice for the right
matters.
In addition to recommending legal consultants, it is also advised to include
guidelines on other issues like how a case is filed and other procedures that are done
by someone who is seeking justice. This can be expanded to include guidelines on
the procedures to apply for government documents, get information from the right
authority and other similar issues.

1.3.4 Summarization of legal documents and contracts


A common problem faced is that legal terms and contracts are not easily
understood by the common man. This is due to the presence of a large number of
statements that can’t be easily read at a stretch and the presence of several hard
terms that cannot be understood right away.
This sort of problem can be solved by having a summarization feature. The
summarization feature creates a summary of the legal document in question. This
can be anything ranging from a contract to a document containing terms and
conditions.
This can also help the legal experts by allowing them to get an idea of the
content of large volumes of documents in a short time.
Note that the aim of the feature is to provide a gist of the document. It does
not provide a detailed explanation of the document as a whole. The loopholes and
other slight problems in the document have to be understood by the user or a legal
consultant. The feature will only sum up the document in a few words so that the
user has an overview of what he is going to agree to.
This will lead to the people being more aware of what they are signing and
allow them to make smarter decisions.
CHAPTER 2
LITERATURE SURVEY

[1] Lawyer’s Intellectual Tool for Analysis of Legal Documents in Russian


Authors: A. Khasianov, I. Alimova, A. Marchenko, G. Nurhambetova, E. Tutubalina, D.
Zuev2018 International Conference on Artificial Intelligence Applications and Innovations
(IC-AIAI)
This system proposes creating a “Robot Lawyer” which is implemented using Expert
systems and Artificial Neural Networks. The author suggests that the system gets constraints
about a case containing intellectual property laws. The system analyses these and tell the
user whether the patent is accepted or rejected.
There are three major steps to this process. The first step is splitting the statements
in words. The author uses the ‘bag of words’ model to do this. The second step involves
named entity identification. In order to do this, Recurrent Neural Network (RNN) and
Conditional Random Field (CRF) is applied. It also gives us conclusions about the number of
distinct words, frequently used words, etc. It aims to provide lexical tools in order to help
the participants of the legal process.

[2] Multiple Data Document Summarization


Authors: V V Krishna Kishore and Pramod Kumar Singh
2017 Conference on Information and Communication Technology (CICT'17)
This work describes the various techniques used in summarization and details how
the entire process is done. The author says that there are three main approaches to
summarize documents: extraction, abstraction and aided approach. The author studies the
scoring methods such as Jaccard similarity, cosine similarity and TF-IDF Similarity. The steps
that are used are tokenization of sentence, part of speech tagging, stemming words, finding
appropriate sense and similarities among the sentences. Sentence importance measures are
also assigned to highlight which are essential for the summary.
The full steps required to summarize the document start with using the documents
and query for document pre-processing. After that a sentence splitter is used and
importance is assigned to the sentences. This is passed through a stack decoder algorithm
along with the constraints and a summary is generated.

[3] Artificial Intelligence for Automatic Text Summarization


Authors: Min-Yuh Day, Chao-Yu Chen
2018 IEEE International Conference on Information Reuse and Integration for Data Science
This work also describes a methodology that uses AI and neural networks to
summarize the passage. In this, the raw data is collected and sent for pre-processing. After
doing this, it is sent for model creation and evaluation. 50,387 essays between 1970 and
2017 are used as a raw dataset. The title of the data is mined using sentiment analysis and
opinion mining. Essay titles and essay abstracts were extracted, flited some special
characters, convert encode and converted into the format of “title-abstract” pair. Finally,
the candidate title with the highest score is selected.
The LibSVM is used to predict whether the token is part of the candidate title or not.
Parameters like Dropout, Loss function and Optimizer are used. Loss function gives the
relevancy between the chosen candidate title and the content being evaluated. Optimizer
used different algorithms to raise accuracy.
As per the system architecture, ROUGE evaluation method is used to put the pre-
processed data into the three models. 80% of data is used for training and 20% is used for
testing and the accuracy is found to be 82.47%.

[4] A Generic Platform to Automate Legal Knowledge Work Process using Machine
Learning
Authors: Annervaz K M, Jovin George, Shubhashis Sengupta
2015 IEEE 14th International Conference on Machine Learning and Applications

Contract Management (CM) is a broad area in Business Process Outsourcing (BPO) which
deals with management of various aspects of legal contracts made for different kinds of
deals. There are three major parts in managing the real estate contracts. They are:
Document Management, Lease Abstraction and Ongoing Maintenance. Lease Abstraction is
where most of the manual work is done. The contents of the lease are evaluated and this
takes a lot of time and effort. The first two problems are addressed in this work.
The problems are closely related to IE (Information Extraction). The three main
concerns of IE are Named Entity Recognition, data from multiple documents and that the
data extracted should be definable. This work outlines the technique to extract data from a
large volume of text and fill it in a pre-defined structure.
The first step is information setup and creating the training data set. This takes
advantage of the fact that most legal documents have sections and sub-sections. A web
interface is provided to the SME (subject matter experts) and they are trained to format this
data in a hierarchical manner. The next step is annotation, where various training samples
about the snippets in the client setup are collected. Previously processed data can be
collected here.
After this, the machine learning models are trained to understand this data. Semi-
supervised learning may take place. Some of the models that may be used here include
Support Vector machines, Naïve Bayes, etc. For various levels of granularity, both models
may be used to provide the prediction. Next a data profile and a rule inducer are used.
When all these steps have been performed, lease abstraction can be performed. A
feedback mechanism is also created so that the system learns from mistakes. The final
decision is taken by the user and he can evaluate if the result obtained is right or wrong. The
recorded data is used for further training cycles.

[5] Maintainable process model driven online legal expert systems


Authors: Johannes Dimyadi, Sam Bookman, David Harvey, Robert Amor.
Published 5th October, 2018. Springer.
The various difficulties that are involved in creating an expert system for Legal
systems are outlined below. The steps involved are analysis and advice, intake and
assessment, intelligence workflow and document automation. Generally, facts to populate
the data are taken from databases, websites or the human experts. The legal system tries to
emulate the human experts in making decisions for the legal processes.
The nature of the legal system itself poses a challenge to the system. The legal
system is very organic i.e, the judgement may vary based on the nature of the case and
strict ‘if-then’ rules alone can’t be used to create it. The complexity of the legal issues makes
it necessary for the system to have a huge set of data before it can make a decision. New
cases and judgements are done everyday and these need to be updated.
Like any other system, the legal system may also have an error margin. However, in
this case it can have dire consequences to the lives of the users. Therefore, a highly reliable
system must be created. For this type of systems, acquiring the expert’s knowledge and
updating it is crucial. The laws may be modified and new acts may be implemented. So, the
system must not run on outdated information.
The Hague Navigation Tool is used as a case-study in this work. The tool is used for
cases regarding family law. The advantages and disadvantages of these tool are explored.

[6] Fuzzy Bag-of-Words Model for Document Representation


Authors: Rui Zhao and Kezhi Mao
IEEE Transactions on Fuzzy Systems ( Volume: 26 , Issue: 2 , April 2018 )
One of the most popular models for document representation is the bag of words
model. This assigns a vector to the document and notes the normalized occurrences of the
basis terms and also the number of such terms. It should be noted that the basis terms are
the high frequency words in a corpus, and the number of basis terms or the dimensionality
of BoW vectors is less than the size of vocabulary. BoW maps the document into a fixed
length vector. It is simple, but effective.
Fuzzy BoW models are proposed to learn more dense and robust document
representations encoding more semantics. The hard mapping in the previous method is
replaced by a fuzzy mapping. Fuzzy BoW introduces vagueness in the mapping between the
words and the basis terms.
This model works based on word embeddings. The core idea behind word
embeddings is to assign such a dense and low-dimensional vector representation to each
word that semantically similar words are close to each other in the vector space. The merit
of word embeddings is that the semantic similarity between two words can be conveniently
evaluated based on the cosine similarity measure between corresponding vector
representations of the two words.

[7] Information Extraction: Evaluating Named Entity Recognition from Classical Malay
Documents
Authors: Siti Syakirah Sazali, Nurazzah Abdul Rahman, Zainab Abu Bakar
2016 Third International Conference on Information Retrieval and Knowledge
Management
Named Entity recognition (NER) is one of the most important aspects of IE. NER finds
the parts of the text that correspond to the proper names and then classifies it to its
appropriate category. IE techniques include extracting proper nouns, commonly known as
NER, relation detection and classification, temporal and event processing, and template
filling. There are two approaches to this. They are rule-based approach and statistical
approach.
This approach uses look up lists and leverages the structure of the language in order
to classify the nouns. Four main approaches are deal with here. They are: Noun Extraction
using Lookup List, Noun Extraction using Morphological Rules (Noun Affixes), Noun
Extraction using Morphological Rules (Verb, Adjective and Noun Affixes), Noun Extraction
using Morphological Rules (Rayner’s Rule).

[8] Python for Data Analytics, Scientific and Technical Applications


Authors: Abhinav Nagpal, Goldie Gabrani
2019 Amity International Conference on Artificial Intelligence (AICAI)
This paper focuses on how python is overtaking R, Matlab and other environments
when it comes to machine learning. Python has become popular due to its simple syntax,
object-oriented designing, portability, testing, and self-documentation capabilities; and the
presence of a Numeric library allowing the effective storage and handling of enormous
amounts of numerical information.
Python has less lines of code and provides high readability. Python lets the
developers choose whether to follow OOPS approach or use scripting. It can be used to link
different data structures (DS) and can be used as a backend language. Its majority of code is
checked in the IDE. Python gives developers the flexibility to provide an API from the current
programming language. Python has the ability to balance high-level programming with low-
level.
Python lets developers use the correct data structure for the correct program.
NumPy, SciPy and pandas are all very useful. These open source libraries of Python cover
almost all our needs while building an AI project.

[9] Named Entity Recognition from Unstructured Handwritten Document Images


Authors: Chandranath Adak, Bidyut B. Chaudhuri, Michael Blumenstein
2016 12th IAPR Workshop on Document Analysis Systems
This work discusses NER in bodies of text that is unstructures. Without character/
word recognition, NE detection from a document image is very difficult because NLP-based
knowledge can hardly be used in such a situation. However, such detection is essential
where linguistic knowledge cannot be used due to the poor performance of handwritten
text recognition engines.
Techniques such as Binarization, Word segmentation, Slant or skew or baseline
correction and characteristics of named entities may be leveraged to find solutions.

[10] Automatic Text Summarization of News Articles


Authors: Prakhar Sethi, Sameer Sonawane, Saumitra Khanwalker, R. B. Keskar
2017 International Conference on Big Data, IoT and Data Science (BID)

The earlier approaches in text summarization focused on deriving text from lexical
chains generated during the topic progression of the article. These approaches were
preferred since it did not require full semantic interpretation of the article. Words of the
same type are connected using semantic relationships such as synonyms. The existing
methods such as lexical chains, Barzilay and Elhadad Approach and Silber and McCoy
Approach are discussed.
For creating the summary, first we tokenize the text and tag it with the part of
speech. After that pronoun resolution occurs. Then the lexical chains are formed and the
sentences are scored.
Some of the methods suggested by this work are: extraction based on our Article
Category, using Sentence Scoring, using strong Lexical Chains, using Proper Noun Scoring.

CHAPTER 3
SYSTEM ANALYSIS

3.1 EXISTING SYSTEM


Some applications use NLP to process the statements from the users and present a
judgement or a decision based on it. These applications will work for some field or section of
the legal system (such as family law, intellectual property, etc). Summarization programs are
present for news articles and even multi data document summarization is available. It is
possible to obtain a law by mentioning the section or referencing its content. Some expert
systems use the questionnaire format to obtain the data from the users. This data is then
processed and the results are given.

3.1.1 Disadvantages of the existing system

 Only a small percentage of such applications are tailored towards the Indian legal
system. The laws are different in each country and the system should be able to
accommodate the changes.
 Lack of a generic application that can be easily extended and implemented for
different sections and different types of laws instead of focusing on one specific
section.
 The application should be easy to use for both the legal experts and the common
man.
 Lack of guidance on how to approach the legal system in order to seek justice. The
advice provided by the system should be able to solve a wider range of problems.
 Connecting the right legal experts with the right clients becomes an issue.
 The solutions and guidance provided by the system should be tailor-made towards
the client.

3.2 PROPOSED SYSTEM


The system provides a generic solution using which the relevant laws are suggested
to the legal experts. The legal experts can then take the final decision on how they are going
to present their case based on the laws suggested. The layman should also be able to use
this system in order to understand his problem better and he should be able to make a
decision whether to pursue his case or not. The system should also be capable of suggesting
the legal experts who the layman can approach in order to file his case. It should be able to
filter out the right experts based on the case. Summarization features are also provided to
better understand contracts and legal documents.

3.2.1 Advantages of proposed system

 Can provide a solution that can be easily extended to different sections.


 The system will be easy to use so even a layman can use it.
 The input will be in Natural language format. There is no need to structure it or use
any official terms.
 Provides a connection between the legal experts and the clients.
 Connects the right client with the right legal expert.
 Summarization will make even complex legal documents much easier to understand.
There is no need to go through large volumes of text before making a quick decision.
 This will make the legal system seem much more approachable and create
awareness about the rights of each person.
3.3 REQUIREMENTS SPECIFICATION
3.3.1 Hardware specification

 RAM: 8 GB
 Hard disk: 10 GB
 Processor: Intel i3 and above
3.3.2 Software specification

 Python
 Windows Operating system (7 and above)

3.4 LANGUAGE SPECIFICATION


3.4.1 Python
Python is being widely used in machine learning applications. It is now used even
more than R and MATLAB. This may be attributed to a variety of features. Python is far less
complicated than other software. Its interfacing legacy software is written with C, C++ and
other languages. Due to the low learning curve, and flexibility of Python, it has become one
of the fastest growing languages. Python’s ever-evolving libraries make it a good choice for
Data analytics.
Python is platform independent. Python balances high-level and low-level
programming. Python has sets, lists, dictionaries, tuples, thread- safe queues, strings, etc.
The open source libraries of Python cover the needs of almost any AI project. By using the
libraries available, the developer does not have to write code from scratch and can simply
use the libraries instead. Some of the most popular Python libraries are NumPy, pandas,
NLTK, Tensorflow, Scikit-Learn and Matplotlib.
NumPy stands for Numerical python. High performance numerical and scientific
calculations can be performed on multidimensional data. It provides a high-level abstraction
for numerical calculations. Numpy also consists of many smaller sub-packages for various
mathematical tasks like linear algebra, FFTs, generating a random value, and polynomial
manipulation. SciPy, another python library is built on numpy.
Pandas are used to perform real world data analysis. Developers use it to load,
manipulate and model data. Boolean indexing, checking for NaN’s in a dataset, selecting and
dropping a column from a dataset are some of the operations that can be performed using
this library. This library eliminates the need for loops. Tensor flow can train and run deep
neural sets.
NLTK is a collection of libraries developed for Natural Language processing. NLTK
stands for Natural Language Toolkit.
CHAPTER 4
SYSTEM DESIGN

4.1 System Architecture

Fig 4.1.1 System Architecture Diagram

The above diagram depicts the flow of the system to be created. Here, three major
paths are represented. The initial step is collecting the problem statement from the user.
This may be the client or the lawyer. The input is in the form of a document containing
natural language. Mining is done by taking out the most essential keywords from the details
provided by splitting the words. After the segregation of the required keywords, mining is
performed on the database so that the related laws and articles can be fetched. This process
is found to be much faster than manually performing the task. Past cases handled and
stored in the knowledge base is also mined based on those keywords and the result needed.
The associated laws and these cases provide a greater insight on the requirements of
the user and the goal to be achieved in the end. Based on this, it is possible to recommend
suitable consultants for the laymen. The consultants are chosen in the field that the
problem statement is about. These results can be filtered further by filtering out the results.
These filters can be decided by several factors like price, location, etc.
This system also features an additional module, which helps to summarize, analyse
and identify the core of any legal document like an agreement or a contract. The document
is provided as an input to the system, which is then analysed and reiterated in simple terms.
4.2 Usecase diagram

Fig 4.2.1 System Usecase diagram


There are three major actors in this scenario. They are client (layman), lawyer and
the AI system. The lawyers and the clients can access the first module to find the relevant
laws that they can use. An addition feature by which related cases can be viewed is also
added. The laymen can access the second module to get recommendations for legal
consultants. These suggestions can also be filtered to provide more appropriate ones. The
document analyser module gets a document as an input and provides the key points of the
document as an output.

4.3 Sequence diagram


Fig 4.3.1 Sequence diagram
The sequence diagram depicts the messages passing between four main entities namely the
Lawyer, the Client, the System and the Database. The lawyer sends the details gathered
from the client to the system. The system fetches the relevant laws from the database and
displays it to the lawyer. The client can also enter his problem into the system and read the
relevant laws. If he decides that his case is worth pursuing, he sends a request to the system
to show him the legal consultants that specialize in the field related to his problem. The
system fetches this information from the database. The search results can be fine-tuned by
addition details provided by the client. If a layman wishes to know the meaning of the
contents of a legal document or contract, he sends it to the system. The system summarizes
the document and displays it back to the client.

CHAPTER 5
MODULE DESCRIPTION

5.1 Modules
The modules that are present in the system are:

 List relevant laws


 Provide legal consultants
 Summarize documents

5.1.1 List relevant laws


This module can take a file containing the problem of the client expressed in natural
language as the input. It produces a list of laws which are relevant to the problem in
question.
This module can be used by both the legal consultants and the common man. These
laws can be used to prepare a defence for the case. The system may also provide relevant
articles so that the legal experts are able to strengthen their case. This reduces the time it
takes to look through the constitution in order to find the right laws. Also, some obscure
laws which are not often used will also be suggested if they are applicable. This provides the
guarantee that the expert will have all the necessary information to present the case.

5.1.2 Provide legal consultants


The system should be able to connect the clients with the right legal advisor. The
system takes in the client’s problem statement as an input and provides a list of consultants
that will be suitable for them as an output.
The client may not know how to contact a lawyer or approach the system for justice.
In such cases, the system will guide them. After analysing the problem, if the client wishes to
file a case, the system can recommend lawyers that excel in the field that involves that case.
This can be deduced by the system based on the laws suggested to the client. The system
can also change its recommendations based on filters such as location, price range, etc to
suit the needs of the client.

5.1.3 Summarize documents


This module takes in a document or a legal contract as an input and provides the
summary of the document as the output.
In the case of lengthy documents, it is tedious to go through them to find out what is
there in them. In such cases the summarization tool helps by providing a short summary to
the user. This makes the people more aware of the content of the contract they sign. Note
that the intent of them module is to provide a general idea about the contents of the
document. It may not be able to explain all the loop holes in the document clearly.

You might also like