Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 55

CHAPTER 1

INTRODUCTION

1.1 NATURAL LANGUAGE PROCESSING

Artificial intelligence is the simulation of human intelligence by machines.


Computers can be made to think like humans and imitate their actions. In
Artificial Intelligence, the machine exhibits the characteristics of humans such
as learning, understanding and problem solving. Artificial intelligence breaks
down human intelligence to a form that can be understood by machines. This is
then used to make the machines perform tasks. These tasks may be simple or
very complex tasks that require hours of work. AI has countless applications
from self-driving cars to performing complex surgeries.

Natural language processing (NLP) is a subfield of computer science,


information engineering, and artificial intelligence concerned with the
interactions between computers and human (natural) languages. Computers are
made to process huge volumes of natural language data and perform tasks and
derive conclusions from it. The ultimate objective of NLP is to read, decipher,
understand, and make sense of the human languages in a manner that is
valuable. Most NLP techniques rely on machine learning to derive meaning
from human languages. NLP can be used to interpret free text and make it
analysable. NLP allows analysts to sift through massive troves of free text to
find relevant information in the files.

1.2 SYSTEM OVERVIEW

The objective is to create a legal system that can help people who wish to
obtain quick legal information, right from a lawyer to a common person. Right
now, lawyers have to go through all the laws in our constitution to find the ones
that are relevant for the case. This process can be made much faster by picking
1
out the laws that seem relevant to the topic at hand and then displaying it to the
user. This can help them come to a decision much quicker. This sort of system
can also help other people who are not well versed in law. The user can put in
their problems and check whether their case is worth pursuing or not before
deciding to hire a legal consultant.

In addition to this, the system can also help laymen to find who to approach
in case they require legal assistance. In case some of the legal documents (such
as contracts) are not clear, a summarization tool can be provided so that the key
points of the given contract are summarized and presented to the user.

1.3 SCOPE OF THE PROJECT

For this system, some of the key problems that both the human experts and
the ordinary person face when it comes to finding solutions and seeking
redressal in the legal system are taken into account. Some of these problems
include the effort taken by lawyers to find the relevant laws, finding the true
nature of legal contracts, and not knowing how to approach the courts for
redressal.

The goal of the system is to make the legal system more understandable and
approachable. It also helps the human experts by speeding up the various
processes involved and connecting the right lawyers with the right clients in
order to ensure that people are able to voice out their troubles and seek justice.

1.3.1 Finding relevant laws

Currently lawyers listen to the problems of their clients and then look
through large volumes of legal documents in order to find a solution to the
client‟s problem. The lawyers have to look through the documents until they
can find laws relevant to the case and then decide which laws to use. This
2
process is tiring and consumes a lot of time. There is also a chance that some
laws that are not regularly used can be skipped in the process of looking for
relevant data. Lawyers also have to find cases which are similar to the current
case at hand and then verify what laws were used to argue the case.

This process of looking through the laws wastes time that the lawyer can use
preparing their case. The time of the human experts can be directed to much
more useful tasks than manually looking through large volumes of text. Thus,
this AI system comes into play.

The proposed system will take a problem statement from the user. After
analysing the problem, the system filters out the relevant laws in the
constitution. After which, it is displayed to the user. Through this, the lawyers
can pick out the laws that they think will be most useful for them. From this, it
is seen that the system will search through the constitution for all the laws that
can be applied to the problem. The lawyer finally has to decide which of the
laws will be most suitable for him to use. The final decision is still left in the
hands of the human experts and they can frame their case however they want.
The goal of the system is to reduce the effort and speed up the time taken to
frame the case.

The input to the system will be in the form of natural language (i.e, the issue
as stated by the lawyer‟s client). It need not be polished too much or have legal
terms added to it in order to be fed into the system.

1.3.2 Help a person decide whether to pursue a case or not

It is possible for a layman to use the system in order to make a decision


whether to pursue a case or not. Since, the input to the system will be in the
form of natural language and doesn‟t need to be edited or have a structure of

3
any sort, it can also be easily used by ordinary people in addition to the experts
(lawyers).

Sometime a person will wish to know whether pursuing a case will have any
value for them or not before attempting to seek justice from the courts. In order
to accurately determine this, it is usually necessary to have some sort of legal
knowledge or an understanding of laws present in the constitution.

Now, since anyone is able to use this system, they can get all relevant laws
just by stating their case to the system. Since the system pulls up all the relevant
laws, the user can read them and get an idea whether the judgement will be in
his favour or not. After that he can opt to approach the human experts or drop
the case if he is in the wrong.

1.3.3 Recommend lawyers and legal consultants

Choosing the right lawyers and legal consultants for a case is just as
important as the facts of the case itself. Many people do not know who to
approach if they want to file a case and do not know any lawyers. Also, the
people might pick a lawyer who is actually not well versed in cases which are
similar to the one they are going to file.

In the case of laymen, the system can also recommend lawyers based on the
nature of their case. For example, if the case looks to be a land dispute the
system can recommend lawyers that are well-versed in civil cases especially
land disputes. Additionally, filters can be added based on the factors such as
location, price-range, etc. This will help the people approach the right lawyer
for the right case.

The aim of this is to make the legal system much more approachable for
even an ordinary person. If people are given clear guidelines on legal practices,

4
a lot more people will be willing to come forward with their issues and seek
justice for the right matters.

In addition to recommending legal consultants, it is also advised to include


guidelines on other issues like how a case is filed and other procedures that are
done by someone who is seeking justice. This can be expanded to include
guidelines on the procedures to apply for government documents, get
information from the right authority and other similar issues.

1.3.4 Summarization of legal documents and contracts

A common problem faced is that legal terms and contracts are not easily
understood by the common man. This is due to the presence of a large number
of statements that can‟t be easily read at a stretch and the presence of several
hard terms that cannot be understood right away.

This sort of problem can be solved by having a summarization feature. The


summarization feature creates a summary of the legal document in question.
This can be anything ranging from a contract to a document containing terms
and conditions.

This can also help the legal experts by allowing them to get an idea of the
content of large volumes of documents in a short time.

Note that the aim of the feature is to provide a gist of the document. It does
not provide a detailed explanation of the document as a whole. The loopholes
and other slight problems in the document have to be understood by the user or
a legal consultant. The feature will only sum up the document in a few words so
that the user has an overview of what he is going to agree to.

This will lead to the people being more aware of what they are signing and
allow them to make smarter decisions.

5
CHAPTER 2

LITERATURE SURVEY

[1] Lawyer’s Intellectual Tool for Analysis of Legal Documents in Russian

Authors: A. Khasianov, I. Alimova, A. Marchenko, G. Nurhambetova,


E.Tutubalina, D. Zuev

This system proposes creating a “Robot Lawyer” which is implemented


using Expert systems and Artificial Neural Networks. “Robot Lawyer” is an
information system that allows participants in the legal process to properly
prepare to a court case and plan for judicial activities. The main focus of the
system is dispute resolution in the entrepreneurship field. The system
implements the following tasks: dispute type identification, legal provisions
retrieval, provision of required documents, searching for competent courts. The
author suggests that the system gets constraints about a case containing
intellectual property laws. The system analyses these and tell the user whether
the patent is accepted or rejected. There are three major steps to this process.
The first step is splitting the statements in words. The author uses the „bag of
words‟ model to do this. The second step involves named entity identification.
In order to do this, Recurrent Neural Network (RNN) and Conditional Random
Field (CRF) is applied. Recurrent Neural Networks (RNN) are widely used for
natural language processing tasks. The main advantage of RNN is that they are
designed to process sequences. The recurrent neural network processes each
element separately, but it uses some information remaining from the processing
of the previous element of the sequence. It also gives us conclusions about the
number of distinct words, frequently used words, etc. The system applies
advanced technologies in the field of artificial intelligence to process and search
information necessary for users. It aims to provide lexical tools in order to help
the participants of the legal process.

6
[2] Multiple Data Document Summarization

Authors: V V Krishna Kishore and Pramod Kumar Singh

This work describes the various techniques used in summarization and


details how the entire process is done. The author says that there are three main
approaches to summarize documents: extraction, abstraction and aided
approach. The author studies the scoring methods such as Jaccard similarity,
cosine similarity and TF-IDF Similarity. The steps that are used are tokenization
of sentence, part of speech tagging, stemming words, finding appropriate sense
and similarities among the sentences. Sentence importance measures are also
assigned to highlight which are essential for the summary. The full steps
required to summarize the document start with using the documents and query
for document pre-processing. After that a sentence splitter is used and
importance is assigned to the sentences. This is passed through a stack decoder
algorithm along with the constraints and a summary is generated.

[3] Artificial Intelligence for Automatic Text Summarization Authors:

Min-Yuh Day, Chao-Yu Chen

This work also describes a methodology that uses AI and neural networks to
summarize the passage. The definition of a summary is a text which includes
one or more words and these words represent important information in the raw
text and shorter than the raw text obviously. In this, the raw data is collected
and sent for pre-processing. After doing this, it is sent for model creation and
evaluation. 50,387 essays between 1970 and 2017 are used as a raw dataset. The
title of the data is mined using sentiment analysis and opinion mining. Essay
titles and essay abstracts were extracted, flited some special characters, convert

7
encode and converted into the format of “title-abstract” pair. Finally, the
candidate title with the highest score is selected. The LibSVM is used to predict
whether the token is part of the candidate title or not. Parameters like Dropout,
Loss function and Optimizer are used. LibSVM is a popular open source
machine learning library written in C++ through a C API. Loss function gives
the relevancy between the chosen candidate title and the content being
evaluated. Optimizer used different algorithms to raise accuracy.As per the
system architecture, ROUGE evaluation method is used to put the pre-processed
data into the three models. 80% of data is used for training and 20% is used for
testing and the accuracy is found to be 82.47%. The contribution of this paper is
applying deep learning to generate short summaries, comparing with different
methods, and the training and testing of automated generation of English essay
titles and abstracts from 1970 to 2017.

[4] A Generic Platform to Automate Legal Knowledge Work Process


using Machine Learning

Authors: Annervaz K M, Jovin George, Shubhashis Sengupta

Contract Management (CM) is a broad area in Business Process


Outsourcing (BPO) which deals with management of various aspects of legal
contracts made for different kinds of deals. There are three major parts in
managing the real estate contracts. They are: Document Management, Lease
Abstraction and Ongoing Maintenance. Lease Abstraction is where most of the
manual work is done. The contents of the lease are evaluated and this takes a lot
of time and effort. The first two problems are addressed in this work. The
problems are closely related to IE (Information Extraction). The first step is
information setup and creating the training data set. This takes advantage of the
fact that most legal documents have sections and sub-sections. A web interface

8
is provided to the SME (subject matter experts) and they are trained to format
this data in a hierarchical manner. The next step is annotation, where various
training samples about the snippets in the client setup are collected. Previously
processed data can be collected here. After this, the machine learning models
are trained to understand this data. Semi-supervised learning may take place.
Some of the models that may be used here include Support Vector machines,
Naïve Bayes, etc. For various levels of granularity, both models may be used to
provide the prediction. Next a data profile and a rule inducer are used. When all
these steps have been performed, lease abstraction can be performed. A
feedback mechanism is also created so that the system learns from mistakes.
The final decision is taken by the user and he can evaluate if the result obtained
is right or wrong. The recorded data is used for further training cycles.

[5] Maintainable process model driven online legal expert systems Authors:

Johannes Dimyadi, Sam Bookman, David Harvey, Robert Amor.

The various difficulties that are involved in creating an expert system for
Legal systems are outlined below. The steps involved are analysis and advice,
intake and assessment, intelligence workflow and document automation.
Generally, facts to populate the data are taken from databases, websites or the
human experts. The legal system tries to emulate the human experts in making
decisions for the legal processes. The nature of the legal system itself poses a
challenge to the system. It is tough to predict the outcome and also influenced
by several other social and political factors. The legal system is very organic i.e,
the judgement may vary based on the nature of the case and strict „if-then‟ rules
alone can‟t be used to create it. The complexity of the legal issues makes it
necessary for the system to have a huge set of data before it can make a
decision. New cases and judgements are done every day and these need to be

9
updated. The Hague Navigation Tool is used as a case-study in this work. The
tool is used for cases regarding family law. The advantages and disadvantages
of these tools are explored.

[6] Fuzzy Bag-of-Words Model for Document Representation Authors: Rui

Zhao and Kezhi Mao

One of the most popular models for document representation is the bag of
words model. This assigns a vector to the document and notes the normalized
occurrences of the basis terms and also the number of such terms. It should be
noted that the basis terms are the high frequency words in a corpus, and the
number of basis terms or the dimensionality of BoW vectors is less than the size
of vocabulary. BoW maps the document into a fixed length vector. It is simple,
but effective. The bag-of-words model is simple to understand and implement
and has seen great success in problems such as language modeling and
document classification. Fuzzy BoW models are proposed to learn more dense
and robust document representations encoding more semantics. The hard
mapping in the previous method is replaced by a fuzzy mapping. Fuzzy BoW
introduces vagueness in the mapping between the words and the basis terms.
This model works based on word embeddings. The core idea behind word
embeddings is to assign such a dense and low-dimensional vector representation
to each word that semantically similar words are close to each other in the
vector space. The merit of word embeddings is that the semantic similarity
between two words can be conveniently evaluated based on the cosine
similarity measure between corresponding vector representations of the two
words.

10
[7] Information Extraction: Evaluating Named Entity Recognition from
Classical Malay Documents

Authors: Siti Syakirah Sazali, Nurazzah Abdul Rahman, Zainab Abu


Bakar

Named Entity recognition (NER) is one of the most important aspects of IE.
NER finds the parts of the text that correspond to the proper names and then
classifies it to its appropriate category. IE techniques include extracting proper
nouns, commonly known as NER, relation detection and classification, temporal
and event processing, and template filling. There are two approaches to this.
They are rule-based approach and statistical approach. This approach uses look
up lists and leverages the structure of the language in order to classify the
nouns. Four main approaches are deal with here. They are: Noun Extraction
using Lookup List, Noun Extraction using Morphological Rules (Noun
Affixes), Noun Extraction using Morphological Rules (Verb, Adjective and
Noun Affixes), Noun Extraction using Morphological Rules (Rayner‟s Rule).

[8] Python for Data Analytics, Scientific and Technical Applications

Authors: Abhinav Nagpal, Goldie Gabrani

This paper focuses on how python is overtaking R, Matlab and other


environments when it comes to machine learning. Python has become popular
due to its simple syntax, object-oriented designing, portability, testing, and self-
documentation capabilities; and the presence of a Numeric library allowing the
effective storage and handling of enormous amounts of numerical information.
Python has less lines of code and provides high readability. Python lets the
developers choose whether to follow OOPS approach or use scripting. It can be
used to link different data structures (DS) and can be used as a backend
language. Its majority of code is checked in the IDE. Python gives developers
11
the flexibility to provide an API from the current programming language.
Python has the ability to balance high-level programming with low-level.
Python lets developers use the correct data structure for the correct program.
NumPy, SciPy and pandas are all very useful. These open source libraries of
Python cover almost all our needs while building an AI project.

[9] Named Entity Recognition from Unstructured Handwritten Document


Images

Authors: Chandranath Adak, Bidyut B. Chaudhuri, Michael Blumenstein

This work discusses NER in bodies of text that is unstructured. Without


character/ word recognition, NE detection from a document image is very
difficult because NLP-based knowledge can hardly be used in such a situation.
In this paper the method for Named Entity (NE) recognition is given in detail.
They begin with using an unstructured document (image I) and the goal is to
identify the named entities from this document. However, such detection is
essential where linguistic knowledge cannot be used due to the poor
performance of handwritten text recognition engines. Techniques such as
Binarization, Word segmentation, Slant or skew or baseline correction and
characteristics of named entities may be leveraged to find solutions.

[10] Automatic Text Summarization of News Articles

Authors: Prakhar Sethi, Sameer Sonawane, Saumitra Khanwalker, R. B.


Keskar

Text Summarization has always been an area of active interest in the


academia. In recent times, even though several techniques have been developed
for automatic text summarization, efficiency is still a concern. The most

12
important task in extractive text summarization is choosing the important
sentences that would appear in the summary. Identifying such sentences is a
truly challenging task. The earlier approaches in text summarization focused on
deriving text from lexical chains generated during the topic progression of the
article. These approaches were preferred since it did not require full semantic
interpretation of the article. Words of the same type are connected using
semantic relationships such as synonyms. The existing methods such as lexical
chains, Barzilay and Elhadad Approach and Silber and McCoy Approach are
discussed. For creating the summary, first the text is tokenized and tagged with
the part of speech. After that pronoun resolution occurs. Then the lexical chains
are formed and the sentences are scored. The lexical chains are formed using all
the nouns in the article, except for proper nouns. The problem with proper
nouns is that they generally don‟t mean anything. Thus, they cannot be added to
any lexical chain. Some of the methods suggested by this work are: extraction
based on our Article Category, using Sentence Scoring, using strong Lexical
Chains, using Proper Noun Scoring. Given the increase in size and number of
documents available online, an efficient automatic news summarizer is the
extremely essential for the current fast-moving online-based operations.

13
CHAPTER 3

SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

AI technology is being used to develop expert systems that solve complex


problems in the legal area. Most of these systems employ rules to describe the
strategies and procedures used by litigators to analyze legal issues. The tasks
performed by these systems include interpreting the law, anticipating the legal
consequences of proposed actions, predicting the effects of changes in
legislation, as well as analyzing and managing cases. A legal expert system is a
domain-specific expert system that uses artificial intelligence to emulate the
decision-making abilities of a human expert in the field of law.

Legal expert systems employ a rule base or knowledge base and an


inference engine to accumulate reference and produce expert knowledge on
specific subjects within the legal domain. Some applications use NLP to process
the statements from the users and present a judgement or a decision based on it.
These applications will work for some field or section of the legal system (such
as family law, intellectual property, etc.). In some cases where there are only a
small set of possible solutions (such as yes, no and maybe) these systems can
even guess the probable judgement with some amount of accuracy. This can be
seen in the case of the system which determines the outcome of cases regarding
intellectual property. In other instances, it may take a decision tree like
approach by getting the users to answer a simple questionnaire and giving broad
suggestions based on their responses.

Summarization programs are present for news articles and even multi data
document summarization is available. Nowadays, it is primarily used on news
articles, where the content is shortened so that the user can read through it. This
saves the users a lot of time. In some applications, it is possible to obtain a law

14
by mentioning the section or referencing its content. This acts as a useful tool
for quick reference.

3.1.1 Disadvantages of the existing system

1. Only a small percentage of such applications are tailored towards the


Indian legal system. The laws are different in each country and the system
should be able to accommodate the changes.
2. Lack of a generic application that can be easily extended and
implemented for different sections and different types of laws instead of
focusing on one specific section.
3. The application should be easy to use for both the legal experts and the
common man.
4. Lack of guidance on how to approach the legal system in order to seek
justice. The advice provided by the system should be able to solve a wider range
of problems.
5. The client may not be suggested a legal expert according to his needs.
6. The solutions and guidance provided by the system should be tailor-made
towards the client.

3.2 PROPOSED SYSTEM

The system provides a generic solution using which the relevant laws are
suggested to the legal experts. When a case is presented in the form of natural
language input, the system then analyses it and pulls up all the relevant laws.
The legal experts can then take the final decision on how they are going to
present their case based on the laws suggested. The layman should also be able
to use this system in order to understand his problem better and he should be
able to make a decision whether to pursue his case or not. Since the input can be
15
provided in natural language, it is easy to use even for the normal people. The
decision can be taken by providing the relevant laws so that the person
understands his situation better. The system should also be capable of
suggesting the legal experts who the layman can approach in order to file his
case. While suggesting the experts the preference of the clients should be taken
into account. It should be able to filter out the right experts based on the case.
The experts can be suggested based on factors like location, price, etc.
Summarization features are also provided to better understand contracts and
legal documents. This feature will highlight the gist of the documents allowing
the legal experts to go through large volumes of documents that contain
information pertaining to the case quicker. The ultimate goal of the system is to
make the legal system more approachable and to make the process of providing
justice simpler and quicker.

3.2.1 Advantages of proposed system

1. The system can provide a solution that can be easily extended to different
sections.
2. The system will be easy to use so even a layman can use it.
3. The input will be in Natural language format. There is no need to
structure it or use any official terms.
4. The system provides a connection between the legal experts and the
clients.
5. The system connects the right client with the right legal expert.
6. Summarization will make even complex legal documents much easier to
understand. There is no need to go through large volumes of text before making
a quick decision.
7. This will make the legal system seem much more approachable and
create awareness about the rights of each person.
16
3.3 REQUIREMENTS SPECIFICATION

3.3.1 Hardware specification

 RAM: 8 GB
 Hard disk: 10 GB
 Processor: Intel i3 and above

3.3.2 Software specification

 Windows Operating system (7 and above)


 Server – side
o Python
 Database
o MongoDB (Atlas: MongoDB)
o Client – side
o React js (JavaScript Library)

3.4 LANGUAGE SPECIFICATION

3.4.1 Python

Python is being widely used in machine learning applications. It is now used


even more than R and MATLAB. This may be attributed to a variety of
features. Python is far less complicated than other software. Its interfacing
legacy software is written with C, C++ and other languages. Due to the low
learning curve, and flexibility of Python, it has become one of the fastest
growing languages. Python‟s ever-evolving libraries make it a good choice for
Data analytics.

Python has become popular for various reasons, including it‟s simple and a
syntax that is like a pseudocode; its modularity; its object-oriented design; it's
profiling, portability, testing, and self-documentation capabilities; and the
17
presence of a Numeric library allowing the effective storage and handling of
enormous amounts of numerical information.

Python is platform independent. Python balances high-level and low-level


programming. Python has sets, lists, dictionaries, tuples, thread- safe queues,
strings, etc. The open source libraries of Python cover the needs of almost any
AI project. By using the libraries available, the developer does not have to write
code from scratch and can simply use the libraries instead. Some of the most
popular Python libraries are NumPy, pandas, NLTK, Tensorflow, Scikit-Learn
and Matplotlib.

The system built heavily relies on the use of the NLTK libraries that are
present in python.

3.4.1.1 Python frameworks/libraries used

Flask

Flask is a python web-application framework. It is often called as a


microframework because it does not require any specific tools or libraries to
function. The intention of Flask is to keep the core simple, but extensible. Flask
depends on the Jinja template engine and the Werkzeug WSGI toolkit. Flask
does not impose any strict decisions on the user and some of the guidelines it
does impose can also be changed according to the developer‟s decision.

In order to use flask, first the flask class is imported. Then an instance of the
class should be created. We use the app.route() to indicate to flask which URL
should trigger a particular function. We must remember to add the CORS to the
functions otherwise, we will get a CORS error. This is done by giving cross
origins support credentials as True.

18
NLTK

NLTK is a collection of libraries developed for Natural Language


processing. NLTK stands for Natural Language Toolkit. It was developed for
research and teaching purpose of NLP and fields like cognitive science,
information retrieval, AI, ML, empirical linguistics. Implemented as a
collection of interdependent modules, it consists of a set of core modules that
defines the basic data types. Even though Python is slower in runtime and has
some design restrictions as compared to compiled languages like C or C++, it is
preferred by scientists and developers in the field of data analytics, numerical
computations and almost all technical domains like, Machine learning, AI, Deep
learning, etc. The main NLTK imports that are used in the project are:

1) Stopwords: Stopwords are imported from nltk.corpus. A stopword is


a commonly used word that a search engine has been trained to ignore.
Some examples of stopwords are “a”, “and” and “the”. Often, these
words need to be removed from bodies of words before it is processed.
2) Word Tokenize: Word Tokenize is imported from nltk.tokenize. It is
used to obtain syllables from a stream of words.
3) Cosine distance: This is imported from nltk.cluster.util. Returns 1
minus the cosine of the angle between two vectors.

Numpy

NumPy stands for Numerical python. High performance numerical and


scientific calculations can be performed on multidimensional data. It provides a
high-level abstraction for numerical calculations. Numpy also consists of many
smaller sub-packages for various mathematical tasks like linear algebra, FFTs,
generating a random value, and polynomial manipulation. SciPy, another
python library is built on numpy.

19
Networkx

Networkx is a python package that is useful in studying the functions,


structures and dynamics of complex networks. Networkx is used by people from
varying fields such as mathematicians, physicists, computer scientists and
biologists. In harmony with the growing number of packages and ecosystems
that python can provide, networkx may be used to perform tasks such as
numerical linear algebra and drawing.

cv2

cv2 is an Open Source Computer Vision Library. It is a library that is built


towards solving computer vision problems. It is mainly used in reading images.
If the image cannot be read, it returns an empty matrix. cv2 has several helpful
methods like „imread‟ which takes in the parameters like path, flags, and other
specifications regarding the image and returns the image.

PIL

The Python Imaging Library (PIL), an open source python library, adds
image processing capabilities to your Python interpreter. This library supports
many file formats, and provides powerful image processing and graphics
capabilities. This library also helps in manipulating the image files.

PyMongo

PyMongo is used to connect the system to a mongodb database. In


pyMongo, the „MongoClient‟ is used. The connection string is passed to the
mongoclient in order to establish a connection. After that, we can run queries on
the various collections present in our MongoDb database.

20
Spacy

Spacy is used for advanced NLP operations. Some of the features of Spacy
include non-destructive tokenization, Named entity recognition, word vectors,
statistical models, deep learning aids, visualizers, etc. spacy excels at large-scale
information extraction tasks. It's written from the ground up in carefully
memory-managed Cython. From spacy we are using „en_core_web_sm‟.

Requests

This is used to easily send and deal with the http requests. If this library is
use, there is no need to add query strings to the URLs manually or to form
encode the put and post data. Instead the json method can be used. Some of the
features of requests include ability to handle sessions with cookie persistence,
automatic decompression of content, multi-part file uploads, chunked http
requests.

Beautiful soup

Beautiful soup is a library that is used for acquiring data from html and xml
files. It is able to navigate, search and modify the parse tree. First a beautiful
soup object is created of the required page. Then we can navigate through the
tree thus obtained or query it according to out requirements. Methods like
„find_all‟ and „get text‟ are commonly used along with beautiful soup.
Beautiful Soup supports the HTML parser included in Python‟s standard
library, but it also supports a number of third-party Python parsers.

When creating a beautiful soup object, first the document is converted to


Unicode, and HTML entities are converted to Unicode characters. Beautiful
Soup then parses the document using the best available parser. The default

21
parser for beautiful soup is a HTML parser. But we can also specifically tell it
to use an XML parser.

3.4.2 React js

React js is an open source javascript library. It is mainly used for building


user interfaces. React is maintained by Facebook along with some independent
developers and companies. React is used as the foundation for the development
of single-page mobile and web applications. React makes the creation of
interactive UIs very simple. React efficiently updates and renders only the
specific components when the state changes. In react, encapsulated components
which manage their own state are built. These can then be combined to form the
full UI. The render method in the react component returns the content to
display. Since each of the components maintains an internal state, whenever the
state changes, the render method is invoked and shows the changes.

3.4.3 MongoDB

MongoDB is a noSQL database. It uses JSON-like documents with optional


schemas. It is a cross-platform, document oriented database program. Some of
the features of MongoDB include Ad-hoc queries, Indexing, Replication, Load
balancing, File Storage, Aggregation.

For this system, MongoDB Atlas is used. It is a fully managed cloud


database developed by the creators of mongoDB. Atlas is able to handle the
complications of managing a database like deploying, managing and healing the
database.

22
CHAPTER 4

SYSTEM DESIGN

4.1 SYSTEM ARCHITECTURE

Fig 4.1.1 ARCHITECTURE OF PROPOSED SYSTEM

23
System architecture diagrams are used to understand, clarify, and
communicate ideas about the system structure and the user requirements that the
system must support. It's a basic framework can be used at the system planning
phase to help partners understand the architecture, discuss changes, and
communicate intentions clearly.

The above diagram depicts the flow of the system to be created. Here, three
major paths are represented. The initial step is collecting the problem statement
from the user. This may be the client or the lawyer. The input is in the form of a
document containing natural language. Mining is done by taking out the most
essential keywords from the details provided by splitting the words. After the
segregation of the required keywords, mining is performed on the database so
that the related laws and articles can be fetched. This process is found to be
much faster than manually performing the task. Past cases handled and stored in
the knowledge base is also mined based on those keywords and the result
needed.

The associated laws and these cases provide a greater insight on the
requirements of the user and the goal to be achieved in the end. Based on this, it
is possible to recommend suitable consultants for the laymen. The consultants
are chosen in the field that the problem statement is about. These results can be
filtered further by filtering out the results. These filters can be decided by
several factors like price, location, etc.

This system also features an additional module, which helps to summarize,


analyse and identify the core of any legal document like an agreement or a
contract. The document is provided as an input to the system, which is then
analysed and reiterated in simple terms.

These are the main three routes that are taken by the system while catering
to the needs of the users.

24
4.2 USE CASE DIAGRAM

Fig 4.2.1 USE CASE DIAGRAM

A use case diagram is a dynamic or behaviour diagram in UML. Use case


diagrams model the functionality of a system using actors and use cases. Use
cases are a set of actions, services, and functions that the

25
system needs to perform. A use case diagram at its core is a representation of a
user's interaction with the system that shows the relationship between the user
and the different use cases in which the user is involved. A use case diagram
can identify the different types of users of a system and the different use cases
and will often be followed by other types of diagrams as well. The use cases are
depicted by either circles or ellipses. While a use case itself might drill into a lot
of detail about every possibility, a use-case diagram can help provide a higher-
level view of the system.

Actors:

1) Lawyer: Has professional knowledge. Works in the legal field and


handles cases for the clients.
2) Layman: Does not have any background in law. The client is the
person that has an issue that needs to be addressed by the system.
3) System: This is the proposed system that proposes assistance for legal
issues.
4) Database: This contains the data that is domain knowledge and other
information that is required for the system to function.

There are three major actors in this scenario. They are client (layman),
lawyer and the AI system. The lawyers and the clients can access the first
module to find the relevant laws that they can use. An addition feature by which
related cases can be viewed is also added. The laymen can access the second
module to get recommendations for legal consultants. These suggestions can
also be filtered to provide more appropriate ones. For the first two modules, the
system will refer to the database which contains the domain knowledge in order
to answer the queries of the users. The document analyser module gets a
document as an input and provides the key points of the document as an output.

26
4.3 SEQUENCE DIAGRAM

Fig 4.3.1 SEQUENCE DIAGRAM

A sequence diagram shows object interactions arranged in time sequence. It


depicts the objects and classes involved in the scenario and the sequence of
messages exchanged between the objects needed to carry out the functionality
of the scenario. Sequence diagrams are typically associated with use case
realizations in the Logical View of the system under development. Sequence

27
diagrams are sometimes called event diagrams or event scenarios. A sequence
diagram simply depicts interaction between objects in a sequential order i.e. the
order in which these interactions take place. We can also use the terms event
diagrams or event scenarios to refer to a sequence diagram. Sequence diagrams
describe how and in what order the objects in a system function. These
diagrams are widely used by businessmen and software developers to document
and understand requirements for new and existing systems.

The sequence diagram depicts the messages passing between four main
entities namely the Lawyer, the Client, the System and the Database. The
lawyer sends the details gathered from the client to the system. The system
fetches the relevant laws from the database and displays it to the lawyer.
Alternatively, the client can also enter his problem into the system and read the
relevant laws. If he decides that his case is worth pursuing, he sends a request to
the system to show him the legal consultants that specialize in the field related
to his problem. The system fetches this information from the database. The
search results can be fine-tuned by addition details provided by the client. If a
layman wishes to know the meaning of the contents of a legal document or
contract, he sends it to the system. The system summarizes the document and
displays it back to the client.

A sequence diagram illustrates use-case realizations i.e. to show how


objects interact to perform the behaviour of all or part of a use case. One or
more sequence diagrams may illustrate the object interactions which enact a use
case. Thus, we see the manner in which these entities interact and pass
messages to each other. The message passing for the three modules (use cases)
outlined in the use case diagram is described in the sequence diagram.

28
CHAPTER 5

MODULE DESCRIPTION

5.1 MODULES

The modules that are present in the system are:

 List relevant laws


 Provide legal consultants
 Summarize documents
 Similar cases

5.1.1 List relevant laws

Fig 5.1.1 RELEVANT LAWS FLOW DIAGRAM

This module can take a file containing the problem of the client expressed in
natural language as the input. It produces a list of laws which are relevant to the
problem in question.

29
This module can be used by both the legal consultants and the common
man. These laws can be used to prepare a defence for the case. The system may
also provide relevant articles so that the legal experts are able to strengthen their
case. This reduces the time it takes to look through the constitution in order to
find the right laws. Also, some obscure laws which are not often used will also
be suggested if they are applicable. This provides the guarantee that the expert
will have all the necessary information to present the case.

This action is performed by processing the various documents containing


the acts and laws and converting them into yml files. The keywords and the
laws associated with them are kept as the key-value pair. When the user enters
his problem statement, the statement is processed and the keywords are taken
from it. For this, ntlk libraries are used. First, the stopwords (i.e., the words like
„and‟, „the‟, etc., that are used often but not really relevant for our purpose) and
the punctuation and only returns the important words.

After this obtaining the keywords, we check whether the word is there as the
key in the yml file. If it is there, the corresponding law is added to the list.
Finally, the list of laws is returned to the user.

If we only depend on whether the law is present or not, we might not give
an accurate picture of the case as the law is returned even if one or two
keywords match. So, we try to establish an order based on how relevant a law is
to the particular problem. In order to do this, we compare the keywords found in
the problem statement to the keywords found in the law. Based on this, we
return a “relevance percentage”. The relevance percentage is obtained by
dividing the keywords that match by the total number of keywords in the law.
The higher the relevance percentage, the more relevant the law is to the given
problem.

5.1.2 Provide legal consultants

30
The system should be able to connect the clients with the right legal advisor.
The system takes in the client‟s problem statement as an input and provides a
list of consultants that will be suitable for them as an output.

The client may not know how to contact a lawyer or approach the system
for justice. In such cases, the system will guide them. After analysing the
problem, if the client wishes to file a case, the system can recommend lawyers
that excel in the field that involves that case. This can be deduced by the system
based on the laws suggested to the client. The system can also change its
recommendations based on filters such as location, price range, etc to suit the
needs of the client.

5.1.3 Summarize documents

Fig 5.1.2 DOCUMENT PROCESSING FLOW DIAGRAM

31
This module takes in a document or a legal contract as an input and provides
the summary of the document as the output.

In the case of lengthy documents, it is tedious to go through them to find out


what is there in them. In such cases the summarization tool helps by providing a
short summary to the user. This makes the people more aware of the content of
the contract they sign. Note that the intent of them module is to provide a
general idea about the contents of the document. It may not be able to explain
all the loop holes in the document clearly.

The module makes use of extractive summarization techniques where the


given text to be shortened is analysed and the words and sentences that stand
out the most are taken from it in order to present the summary. The technique
used to find the similarities between the sentences is cosine similarity. Cosine
similarity is a measure of similarity between two non-zero vectors of an inner
product space that measures the cosine of the angle between them. The article is
split into sentences and stop words are removed. Then a similarity matrix is
generated and the rank of the sentences is generated. Based on that, the top
sentences are picked for summary.

5.1.4 Similar cases

Fig 5.1.3 SIMILAR CASES FLOW DIADRAM


32
This module is used to provide users with links to cases like their own. The
user enters a string containing the details of his/her case. This string is then
concatenated with the url of any website that has legitimate details of cases and
the search page is obtained in the backend. The links of the urls scraped from
the site are displayed in the form of a list to the user. Using this, they can view
the similar cases.

The main details that we need to know to build this module, are the general
format of the url of the site to be scraped and in which div the links and the
contents we desire to use are present. From this page, the links to the similar
cases are filtered out for the user.

This method is generic as any site can be chosen to perform the web
scrapping on. The only major changes that should be done in order to do it on a
different site are changing the url and the name of the div from which we
extract the url contents. Sites which primarily focus on legal content are given
preference.

This will give the people a more thorough understanding of situations


similar to theirs and will give a broader perspective of the problems they are
facing. It will also help the lawyers because they can quote these cases while
arguing.

33
CHAPTER 6

CONCLUSION AND FUTURE ENHANCEMENT

6.1 CONCLUSION

The system aims at reducing the time taken to manually look up the laws
related to a case. The system takes in input from the user and splits it into key
words. After that all the relevant laws are fetched based on the keywords
obtained from the statements. Based on the laws fetched or the input of the user,
we find the right lawyer for the case. Since there is no rigid format for entering
the input, the system can be used even by laymen. This makes the application
very user friendly.

In the case of summarization, input can be given in the form of a text file.
The system then summarizes the paragraph to the required length. Extractive
summarization techniques are used so that the most relevant terms and
sentences are combined to form a summary. Cosine transform is used to
generate the ranking matrix. The named entity recognition present along with
the summarization module, helps to pick out the important words like names
and places within the document so that users can confirm whether the document
is relevant to the case that they are dealing with.

The document processing is done by figuring out the words in the picture
provided by the user and either summarizing it or picking out the named entities
as per the requirement. This will give a good idea about what the document is
about and whether it will be useful to the user.

The similar cases module is done by using web scraping to get the links to
cases that are similar to the users from reputable sites. This will give a broader
understanding of the scenario to the user.

34
6.2 FUTURE ENHANCEMENTS

Further developments may be done in the UI to make the application even


more user friendly. As of now priority is given to whether the keywords in the
passage match the keywords in the laws or not. If the words are present, then
the law is displayed. The laws that are displayed can be ordered so that the most
relevant ones are displayed first for the convenience of the user. The laws
displayed can also be colour coded so that they can represent whether the laws
are for or against the case of the current user.

Right now, extractive summarization is used to perform the summary of the


documents. Abstractive summarization techniques can also be used in these
scenarios. It is a more complex concept and is being explored right now.

35
APPENDIX 1

SAMPLE CODING:

Python server:

from flask import Flask, request, make_response,

jsonify from flask_cors import CORS, cross_origin

import webscrapingprog

import relevantlaws

import documentreader

import docsummarizer

import recognizename

import mongohelper

app=Flask(__name__)

CORS(app, support_credentials=True)

@app.route('/relevant1')

def helloworld():

return "Hello world"

36
@app.route('/similar',methods=["GET"])

@cross_origin(supports_credentials=True)

def returnsSimilarCases():

print("Reached method")

caseString=request.args.get("cstring")

strlist=caseString.split(" ")

caseString="+".join(strlist)

print(caseString)

resultString=webscrapingprog.webscrapingprog(caseString)

res=make_response(jsonify(resultString), 200)

return res

@app.route('/relevant',methods=["GET"])

@cross_origin(supports_credentials=True)

def relevantLawList():

problemString=request.args.get("pstring")

print(problemString)

resultString=relevantlaws.relevantlaws(problemString)

print("Acquired result:")

print(resultString)

res=make_response(jsonify(resultString),200)

return res
37
@app.route('/documentsummary',methods=["POST"])

@cross_origin(supports_credentials=True)

def documentAnalysis():

try:

imagefile=request.files.get('imagefile','')

pic_content= documentreader.documentreader(imagefile)

#print(pic_content)

pic_summary=docsummarizer.docsummarizer(pic_content)

print(pic_summary)

res=make_response(jsonify(pic_summary),200)

except Exception as err:

print(err)

return res

@app.route('/documentner',methods=["POST"])

@cross_origin(supports_credentials=True)

def documentEntities():

try:

imagefile=request.files.get('imagefile','')

pic_content=documentreader.documentreader(imagefile)
38
name_list=recognizename.recognizename(pic_content)

res=make_response(jsonify(name_list),200)

except Exception as err:

print(err)

return res

@app.route('/dbgetall',methods=["GET"])

@cross_origin(supports_credentials=True)

def getAllFromDb():

try:

results=mongohelper.getAll()

res=make_response(jsonify(results),200)

except Exception as err:

print(err)

return res

@app.route('/dbparticular',methods=["GET"])

@cross_origin(supports_credentials=True) def

getParticular():

field=request.args.get("field")

print("Argument is"+field)

fee=request.args.get("fee")
39
print("Argument is"+fee)

try:

results=mongohelper.getParticular(field,fee)

res=make_response(jsonify(results),200)

except Exception as err:

print(err)

return res

if __name__=='__main__':

app.run()

Summarization:

from nltk.cluster.util import cosine_distance

import numpy as np

import networkx as nx

from nltk.corpus import stopwords

import nltk

nltk.download('stopwords')

def read_article(file_name):

filedata = file_name
40
print("FileData is:")

print(filedata)

article = filedata.split(". ")

print("Article is:")

for i in article:

print("line:",i)

sentences = []

for sentence in article:

print(sentence)

sentences.append(sentence.replace("[^a-zA-Z]", " ").split(" "))

sentences.pop()

print ("Sentences are:")

print(sentences)

return sentences

def sentence_similarity(sent1, sent2, stopwords=None):

if stopwords is None:

stopwords = []
41
sent1 = [w.lower() for w in sent1]

sent2 = [w.lower() for w in sent2]

all_words = list(set(sent1 + sent2))

vector1 = [0] * len(all_words)

vector2 = [0] * len(all_words)

# build the vector for the first sentence for w in sent1:

if w in stopwords:

continue

vector1[all_words.index(w)] += 1

# build the vector for the second sentence for w in sent2:

if w in stopwords:

continue

vector2[all_words.index(w)] += 1

return 1 - cosine_distance(vector1, vector2)

42
def build_similarity_matrix(sentences, stop_words):

similarity_matrix = np.zeros((len(sentences), len(sentences)))

for idx1 in range(len(sentences)):

for idx2 in range(len(sentences)):

if idx1 == idx2: #same sentences

continue

similarity_matrix[idx1][idx2] = sentence_similarity(sentences[idx1],
sentences[idx2], stop_words)

return similarity_matrix

def generate_summary(file_name, top_n=1):

stop_words = stopwords.words('english')

summarize_text = []

sentences = read_article(file_name)

sentence_similarity_martix = build_similarity_matrix(sentences,
stop_words)

43
sentence_similarity_graph =
nx.from_numpy_array(sentence_similarity_martix)

scores = nx.pagerank(sentence_similarity_graph)

ranked_sentence = sorted(((scores[i],s) for i,s in enumerate(sentences)),


reverse=True)

print("Indexes of top ranked_sentence order are ", ranked_sentence)

if(top_n>len(ranked_sentence)):

top_n=1

for i in range(top_n):

summarize_text.append(" ".join(ranked_sentence[i][1]))

print("Summarize Text: \n", ". ".join(summarize_text))

result=". ".join(summarize_text)

return result

def docsummarizer(textpassage):

result = generate_summary(textpassage, 2)

print("Finished summary")

return {"response":result}

44
React frontpage: Relevant Laws:

import React from 'react';

import 'antd/dist/antd.css';

import {Input,Button,Table} from 'antd';

const { Column, ColumnGroup } = Table;

const {TextArea}=Input;

class RelevantLaw extends

React.Component{ constructor(props){

super(props);

this.state={

problemString:'',

relevantLaws:[]

this.handleInputChange=this.handleInputChange.bind(this)

this.onClickHandler=this.onClickHandler.bind(this)

45
handleInputChange(event){

event.preventDefault()

this.setState({

...this.state,

problemString: event.target.value

})

console.log(this.state.problemString)

async onClickHandler(){

await
fetch("http://127.0.0.1:5000/relevant?pstring="+this.state.problemString,{

method:"GET"

}).then((response)=>{

console.log(response)

return response.json()

}).then((parsedJson)=>{

var messageJson=JSON.stringify(parsedJson)

var stateObj=JSON.parse(messageJson)

this.setState({

...this.state,

relevantLaws:stateObj

46
})

})

console.log(this.state.relevantLaws)

render(){

return(

<div>

<div>

<b>What is this about?</b>

<p>Enter your case or scenario in your own words. No need to


worry about using technical terms. This module will return all the laws that are
found to be connected with your given case. Hint: Be as descriptive as possible
if you need more laws or suggestions.</p>

</div>

<TextArea onChange={(event)=>this.handleInputChange(event)}
/>

<Button type="primary" onClick={this.onClickHandler}>Find


relevant laws</Button>

<p>Note: The higher the relevance percentage for a law, the more
likely it is to be useful for the case.</p>

<div>

47
<Table dataSource={this.state.relevantLaws}>

<Column title="Relevant Law" dataIndex="law"


key="law"></Column>

<Column title="Relevance Percentage" dataIndex="rel"


key="rel"></Column>

</Table>

</div>

<p>Recommended lawyers</p>

<div>

<Table dataSource={this.state.relevantLawyers}>

<Column title="Name" dataIndex="name"


key="name"></Column>

<Column title="Phone Number"


dataIndex="phonenumber" key="phonenumber"></Column>

<Column title="Field" dataIndex="field"


key="field"></Column>

<Column title="Average Fee" dataIndex="avgfee"


key="avgfee"></Column>

</Table>

</div>

</div>

) }}

48
export default RelevantLaw;

49
APPENDIX 2

SCREENSHOTS

Homepage

Description

The starting page to the website. It lists all the features that are present so
that the user will have more details and will know what to go for.

Relevant Laws

Description:

This page is used to pick out the laws that are relevant to the user. The user
enters their problem statement in the text box seen below. After processing, the
relevant laws are returned in the table below. The relevance percentage
indicates how well the law matches to the scenario given by the user.

50
Input:

A small passage that describes the situation at hand.

Output:

A list of relevant laws along with the relevance percentage.

Document Processing

Description:

This can mainly be used to process the images of documents, terms and
conditions, etc. The user uploads the image of the document and then clicks on
of the buttons depending upon what action is to be performed. They can either
click the summary button or the named entity button.

Input:

Image file of the document.

Output:

If the summarize button is clicked, it outputs the summary of the document.

If the named entities button is clicked, it outputs the list if named entities.

51
Picture of summarizing:

Picture of named entity:

52
Find Lawyers

Description:

Helps the client connect to the right lawyer. Provides a list of lawyers and
this can be filtered according to the needs of the client.

53
REFERENCES

[1] Lawyer‟s Intellectual Tool for Analysis of Legal Documents in Russian,


International Conference on Artificial Intelligence Applications and Innovations
(IC-AIAI)

Authors: A. Khasianov, I. Alimova, A. Marchenko, G. Nurhambetova, E.


Tutubalina, D. Zuev2018

[2] Multiple Data Document Summarization, 2017 Conference on


Information and Communication Technology (CICT'17)

Authors: V V Krishna Kishore and Pramod Kumar Singh

[3] Artificial Intelligence for Automatic Text Summarization, 2018 IEEE


International Conference on Information Reuse and Integration for Data Science

Authors: Min-Yuh Day, Chao-Yu Chen

[4] A Generic Platform to Automate Legal Knowledge Work Process using


Machine Learning, 2015 IEEE 14th International Conference on Machine
Learning and Applications

Authors: Annervaz K M, Jovin George, Shubhashis Sengupta

[5] Maintainable process model driven online legal expert systems,


Published 5th October, 2018. Springer.

Authors: Johannes Dimyadi, Sam Bookman, David Harvey, Robert Amor.

[6] Fuzzy Bag-of-Words Model for Document Representation, IEEE


Transactions on Fuzzy Systems ( Volume: 26 , Issue: 2 , April 2018 )

Authors: Rui Zhao and Kezhi Mao

54
[7] Information Extraction: Evaluating Named Entity Recognition from
Classical Malay Documents, 2016 Third International Conference on
Information Retrieval and Knowledge Management

Authors: Siti Syakirah Sazali, Nurazzah Abdul Rahman, Zainab Abu Bakar

[8] Python for Data Analytics, Scientific and Technical Applications, 2019
Amity International Conference on Artificial Intelligence (AICAI)

Authors: Abhinav Nagpal, Goldie Gabran

[9] Named Entity Recognition from Unstructured Handwritten Document


Images, 2016 12th IAPR Workshop on Document Analysis Systems

Authors: Chandranath Adak, Bidyut B. Chaudhuri, Michael Blumenstein

[10] Automatic Text Summarization of News Articles, 2017 International


Conference on Big Data, IoT and Data Science (BID)

Authors: Prakhar Sethi, Sameer Sonawane, Saumitra Khanwalker, R. B.


Keskar

[11] Summarization of Scientific Paper through Reinforcement Ranking on


Semantic Link Network, 2017

Authors: Xiaoping Sun and Hai Zhuge

[12] Design of a Meta Search System for Legal Domain, 2017 International
Conference on Advanced Computing and Communication Systems

Authors: Ambedkar Kanapala, Sukomal Pal, Rajendra Pamula

55

You might also like