Download as pdf or txt
Download as pdf or txt
You are on page 1of 75

“Question & Answering System Using Natural Language Processing”

A Dissertation Report Submitted to

Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal (M.P.)


TowardsPartial Fulfillment for the Award of

Master of Technology
(Computer Science & Engineering)

Submitted By:
Nilam Chourasiya
(Enrollment No. – -0837CS20MT03)

Under the Supervision of:

Dipshikha Sharma
Supervisor SIMS, Indore

Department of Computer Science & Engineering


School of Engineering
Sanghvi Institute of Management of Science, Indore (M.P)
Session: 2022-23
Sanghvi Institute of Management & Science, Indore (M.P.)

Department of Computer Science and Engineering

RECOMMENDATION

This dissertation work entitled “Question & Answering System Using


Natural Language Processing” being submitted by “Nilam
Chourasiya”(Enrollment No: 0837CS20MT03) for partial fulfillment
of the requirement for the award of “Master of Technology “with
specialization in “Computer Science & Engineering” at Sanghvi
Institute of Management & Science, Indore during the year 2022-2023
is satisfactory account of her project work under my supervision is
recommended for award of the degree.

APPROVED & SUPERVISED BY:

Dipshikha
Sharma
Supervisor
SIMS, Indore

FORWARDED BY:

Dipshikha Sharma Dr. Suresh Batni


Supervisor Principal
SIMS,Indore SIMS, Indore

i
Sanghvi Institute of Management & Science, Indore (M.P.)
Department of Computer Science and Engineering

CERTIFICATE

This is to certify that the work embodies in this dissertation entitled


“Question & Answering System Using Natural Language Processing”
being submitted by “Nilam Chourasiya” (Enrollment No.:
0837CS20MT03) for partial fulfillment of the requirement for the award
of “Master of Technology in “Computer Science & Engineering” to
Rajiv Gandhi ProudyogikiVishwavidyalaya, Bhopal (M.P.), during the
academic year 2022-23 is a record ofbonafide piece of work carried out in
the “Department of Computer Science & Engineering”, SIMS, Indore
(M.P.).

Internal Examiner External Examiner

Date: Date:

xiv
Sanghvi Institute of Management & Science, Indore (M.P.)
Department of Computer Science and Engineering

DISSERTATION APPROVAL SHEET

This dissertation work entitled “Question & Answering System Using


Natural Language Processing” being submitted by “Nilam Chourasiya”
(Enrollment No.: 0837CS20MT03) for partial fulfillment of the
requirement for the award of “Master of Technology” with specialization
in “Department of Computer Science & Engineering” by Rajiv Gandhi
Proudyogiki Vishwavidyalaya, Bhopal (M.P).

APPROVED & SUPERVISED BY:

Dipshikha
Sharma
Supervisor
SIMS, Indore

xiv
Sanghvi Institute of Management & Science, Indore
Department of Computer Science and Engineering

DECLARATION BY THE CANDIDATE


I declare that the work entitled “Question & Answering System Using Natural
Language Processing” is my own work conducted under supervision of Dr. Suresh
Batni, Principal,SIMS Indore (M.P).

I further declare that to the best of my knowledge the present work does not contain
any part of the work which has been submitted for the award of any degree either in
this University or in other University/ Deemed University without proper citation.

Nilam Chourasiya

(0837CS20MT03)

UNDERTAKING
The work entitled “Question & Answering System Using Natural Language
Processing” our own work. A check for plagiarism has been carried out on the
thesis/project/ dissertation is found within the acceptable limit.

Nilam Dipshikha Sharma Dr. Suresh Batni


Chourasiya
(0837CS20MT03) Supervisor Principal
SIMS, Indore SIMS, Indore

xiv
xiv
ACKNOWLEDGEMENT

I am grateful to Dipshikha Sharma, Supervisor/Dissertation Coordinator, and


Department of Computer Science & Engineering for their valuable guidance,
intelligent suggestions, fruitful discussion, and generous encouragement.
Without his help it would have been difficult overcome the conceptual and
practical problems.

I am thankful to Dr.Suresh Batni, Principal, SOE, SIMS, Indore for providing


all facilities and academic environment for my work.

I am thankful to Dr.Suresh Batni, Head of Department of Computer Science &


Engineering and all the faculties of Sanghvi Institute of Management & Science,
Indore for their institutional support.

Finally, I am very much thankful to my parents for their personal attention and
care.

Nilam Chourasiya

(0837CS20MT03)

xiv
Abstract

Question and Answering System is one of the major research are in Natural Language
Processing and Information Retrieval. Main challenges of Question and Answer system
gives exact answer of question which give by user. Question and Answering system can
be classified into three category are open domain, closed domain and restricted domain.
Using advanced Natural Language Processing tool we will be developed a framework for
question answering system. In this paper we work on restricted domain question
answering system. Proposed system work on keyword and question matching and return
precise answer of question.

Keywords: - Natural Language processing, information retrieval, semantic similarity, restricted


domain, answer extraction, answer ranking.

VII
TABLE OF CONTENTS
Page No.

Certificate of Recommendation i

Certificate of Approval ii

Declaration of Candidate iii

Acknowledgement iv

Abstract v
Table of Content vi
List of Figures viii
List of Tables ix

CHAPTER 1: INTRODUCTION 1-6

1.1 Background 1

1.2 Motivation 2

1.3 Introduction of Question Answering System 2

1.4 Architecture of Question Answering System 3

1.5 Thesis Objective 6

1.6 Thesis Outline 6

1.7 Summary 7

CHAPTER 2: LITERATURE REVIEW 8-12

2.1 Literature Review 8


2.2 Major Contribution 10
2.3 Comparative Study of Prominent Work 11
2.4 Summary 12

vi
CHAPTER 3: PROBLEM IDENTIFICATION 13-20

3.1 Problem Domain 13

3.2 Problem Identification 15

3.3 Existing System 19

3.4 Summary 20

CHAPTER 4: PROPOSED WORK 24-32

4.1 Proposed Work 24

4.2 Recommendation Algorithm 26

4.3 Implementation Details 27

4.4 Tools Used 29

4.5 Implementation & Reference Classes 30

4.6 GUI Design 31

4.7 Summary 32

CHAPTER 5: RESULT ANALYSIS 33-40

5.1 Result Analysis 33

5.2 Result Evaluation 35

5.3 Summary 40

CHAPTER 6: CONCLUSION & FUTURE WORK 41-41

6.1 Conclusion 41

6.2 Future Work 41

References 42
List of Publication 44
Plagiarism Report 48

Vii
LIST OF FIGURES

Figure No. Figure Name Page No.

1.1 Block Diagram of Question Answering System 1

1.2 Basic Process of Extraction Answer 3

1.3 Architecture of Question Answer System 5

3.1 Block Diagram of Existing Question Answering System 14

3.2 Work Flow Diagram of Existing Question Answering System 20

3.3 Existing Work Flow Diagram 21

4.1 Proposed System Architecture 25

4.2 Flow Chart of Proposed System 26

viii
LIST OF TABLES

Table No. Table Name Page No.

2.1 Comparatively Study 11

4.1 Identification System 24

4.2 Reference Classes 30


Question & Answering System Using Natural Language Processing

CHAPTER 1
INTRODUCTION
1.1 Background
In Computer Science, Question Answering System always helps to user in easier way to find the
answers of given question. Information area is new area for human. With help of web crawlers,
we can get any data readily available. We are only a tick far from getting to a page at remote
corner of the world. We have constantly needed PCs to act savvy [1]. To achieve this undertaking
the field of Artificial Intelligence appeared. One of the key obstructions in making PCs clever is
comprehension of Natural Language. Normal dialect handling which manages comprehension of
dialects is sub division of Artificial Intelligence. Square outline of Question Answering System is
appears in beneath figure1.1. Question Answering is a great NLP application. Assignment that an
inquiry noting framework acknowledges is given an inquiry and gathering of archives, finds the
correct response for the question. It incorporates two integral objectives: first comprehend the
different issues in normal dialect comprehension and portrayal and the second to plan regular
dialect interface to PCs [2].

Figure 1.1 Block Diagram of Question Answering System

Department of Computer Science Page 1


Question & Answering System Using Natural Language Processing

1.2 Motivation
We are always in a quest of Information However there is distinction in data and information.
Data Retrieval or web seek is develop and we can get applicable data readily available. Question
Answering is a specific type of Information Access, which looks for knowledge. We are occupied
with getting the important pages as well as we are keen on finding particular solution to inquiries.

Question Answering is in itself crossing point of Natural Language Processing, Information


Retrieval, Equipment Learning, Knowledge Representation, Reasoning and Inference, Semantic
Search. It gives a decent stage to dive into "nearly" all of AI. On the off chance that an
announcement is made that "Question Answering is a definitive AI", the announcement will be
univocally acknowledged. Question answering framework in their being is a work of art; in the
meantime it incorporates science in their embodiment. Question Answering Systems are required
all over the place, be it restorative science, learning frameworks for understudies, individual
assistants. Question Answering Devices are required all over the place, be it medicinal science,
learning frameworks for understudies, individual colleagues. It is need in each angle where we
require some help from PCs. It's a given that it merits investigating the leaving field of question
noting [3].

1.3 Introduction of Question Answering System


The typical Question Answering system is planned to answer simple questions like “who”,
“what”, “when”, “where”, etc. However the recent QA research is targeted on extending the
system to answer complex questions, overview questions, judgment questions and so on. The
paper proposes a Question Answering system that answers simple factoid, Wh-questions by using
a technique called Semantic Role Labeling.

A standard Question Answering system can be divided into 3 modules namely:

 Question Processing module


 Document Processing or Information Retrieval module
 Answer Processing module

Each module contains several sub modules and these modules use several Natural Language

Department of Computer Science Page 2


Question & Answering System Using Natural Language Processing

Processing Techniques in order to extract the proper answer [5].

Although the set of documents, which are retrieved by the search engine, contain a lot of
information about the search topic but it may or may not contain exactly that information which
the user is looking for [6].The basic idea behind the question answering system is that the users
just have to enter the question and the system will retrieve the most appropriate and precise
answer for that question and return it to the user. Hence in those cases where the user is looking
for a short and precise answer, question answering System plays a great role rather than Search
Engines, which usually provide a large set of links of those web pages which might contain the
answer of that question. Below figure 1.2 shows basic process of extraction answer.

Figure 1.2 Basic process of extraction answer

1.4 Architecture Of A Question Answering


In this section we describe the architecture of our system. The overall architecture of the system
can be subdivided into three main modules:

 Pre-processing
 Question template matching
 Answering.

Department of Computer Science Page 3


Question & Answering System Using Natural Language Processing

Each module is described in detail in the following subsections.

Question Answering Systems can be classified on the basis of the domains over which it has been
constructed.

 Open Domain Question Answering


 Close Domain Question Answering
 Restricted Domain Question Answering

Open domain question answering systems are domain independent. It relies on general ontology
and world knowledge. Usually these systems have a large collection of data from where the
required answer is to be found out. Since in case of Open Domain question answering information
content is not of particular domain it can answer questions of various fields however here deep
reasoning is not possible [3].

Department of Computer Science Page 4


Question & Answering System Using Natural Language Processing

Figure 1.3 Architecture of Question Answer System

Close domain question answering systems deal with questions in a specific domain [3]. LUNAR
and BASEBALL are the example of close domain QA systems .In this case the data set contains a
very limited amount of focused and structured information. Hence in case of close domain
question answering systems deep reasoning is possible but the problem with these systems was
that due to the very small size of data set they are not more than a 'Toy Systems"[4].

Department of Computer Science Page 5


Question & Answering System Using Natural Language Processing

Research in restricted-domain question answering (RDQA) addresses problems related to the


incorporation of domain• specific information into current state-of-the-art QA technology with the
hope of achieving deep reasoning capabilities and reliable accuracy performance in real world
applications. In fact, as a not too-long-term vision,

1.5 Thesis Objectives


 Implement Question Answering System.
 To enhance accuracy of Answer.
 To provide integrated algorithm for better performance.
 To provide Multiple Answer for Descriptive and Definition Type Question.
 Find Question Template for given Question.

1.6 Thesis Outline


Thesis organized in following way:

Chapter-1 Introduction: This chapter deals with all the introductory requirements for
understanding the domain area. It gives the details, which are necessary to understand the work,
and measures its outcomes. It provides the motivations, Background, problems understanding and
a view of proposed solution. This is very first and essential part of the report, which contains the
brief details about the Question Answering System.

Chapter-2 Literature Review: It presents a survey on technologies available with the domains.
In this a wide variety of existing mechanism, algorithms and architectures is studied for
identifying the issues removed and remains in Question Answering area.

Chapter-3Problem Identification: In this chapter we identify problem in existing system. Later


on, this will give a brief categorization of various approaches, which has been suggested over the
last few years on Question Answering System using Data mining approaches.

Chapter-4Proposed Work: After studying the different existing mechanism this identifies the
System Preliminary. It gives a clear understanding the Algorithm with its steps. It will help the

Department of Computer Science Page 6


Question & Answering System Using Natural Language Processing

solution to provide better resolution of the current situations of security.This chapter also gives
implementation plan and Testing Strategy of above security problems by suggesting an
architectural solution. Here in this chapter the implementation of our proposed system will be
done. The implementation is working on which platform, what kind of theme and approach is
followed is referred in this section.

Chapter-5Result Analysis: Developing a solution is an approach proving mechanism but to


prove its results is a complicated task because it measures each and every step of the solution and
let it compare with the existing mechanisms. Either the proposed system, which we have
implemented, is working properly or not will be discussed in this section. The results are going to
be verified on the basis of the analysis.

Chapter-6Conclusion and Future Work: This chapter gives concluding remarks on the
dissertations and gives a final analysis and comparisons along with some future directions of the
work. The future scope and the short summary will be discussed. It gave an idea how we can
expand the work in future which we have performed in this report.

1.7 Summary
This chapter allocates with all the introductory requirements for understanding the domain
area. It gives the details, which are necessary to understand the work, and measures its outcomes.
It provides the motivations, background, problems understanding and a view of proposed
solution.

Department of Computer Science Page 7


Question & Answering System Using Natural Language Processing

CHAPTER 2
LIRERATURE SURVEY
2.1 Literature Review
1. Jinzhong Xu et al.,[1] had worked on Research of Automatic Question Answering Sysytem in
Network Teaching . This method offer students and teachers to exchange and answer the Question
based on Natural Language.

This paper advances a model of automatic question answering system which is based on
Natural Language Processing. This paper introduces theory of semantic representation and
Ontology. This paper has researched the key technologies of Question answering system
based on Natural language Processing. This paper does not gives an accurate answer for
complex questions.

2. Shouning Qu et al.,[2] had worked on Research and design of Intelligent Question Answering System.
This model proposes a model which supports natural language and finds answer from the intelligent
Question answering system.
This system proposes an improved text classification algorithm to classify the database question
accurately. The classification algorithm with improved TFIDF method gives more accurate answer
than the traditional method. After user inputs text, the system analyses the data using natural
processing, positions the target relevant to category , matches the answer and provides efficient
answer.

3. Wael Salloum et al.,[3] had worked on A Question Answering System based on Conceptual Graph
Formalism . This paper proposes a new text based question answering system which converts
knowledge into documents and question into Conceptual Graph formalism. For every question type
there is a different conceptual grah formalism, thus for each question many CG’s are generated. In
this paper projection operator is used to compare questions CG to a sentences CG , and then the exact
answer is extracted from it. More reaserch can be done on extracting answers from simple sentences
and combining sentences of similar meaning.
4. Tilani Gunawardena et al.,[4] had worked on An Automated Answering System with template
Matching for Natural Language Questions. This system uses Closed Domain Question answering
system to find the answers. Therefore answers are stored in a database by domain experts.

Department of Computer Science Page 8


Question & Answering System Using Natural Language Processing

The final answer extracted in this paper has the ability to answer the question asked in SMS or
English language. For this answer extraction process a Template Matching technique is applied in this
paper. This paper does not guarantees that the sytems gives an accurate answer since the system deals
with the i) lack of understanding of problem domain ii) handling SMS abbreviations iii) Handling
Spelling mistakes.
5. Erfan Najmi et al.,[5] had worked on Intelligent Semantic Question Answering System. This paper
introduces an approach for Question answering system using Semantic Technologies.
This paper converts the Query into Resource Description Framework (RDF) triples and then
searches the answer in the RDF files. The advantage of having this system is that it has less
Computation time. The system also has some disadvantages that it does not provide answers to
Descriptive or Long answer type Question.
6. PayaJ Biswas et al.,[6] had worked on the A Framework for Restricted Domain Question Answering
System. This paper proposes a framework for restricted domain question answering system.
This model makes use of Information Extraction than that of Information Retrieval Process used
by search engines. This framework can be used to develop a system which provides exact and precise
answer. Also this model provides a proper flow of data for answer extraction. The major issues exited
in the proposed model is that the performance depends on the search engine and the NLP tools used.
7. Varsha Bhoir et al.,[7] has proposed Question Answering System : A Heuristic Approach. The
proposed model works for specific domain of tourism, which is a restricted domain model.
The main aim of this restricted domain Question answering system is to improve the accuracy of the
extracting answer. The system returns precise answers related to the tourism domain. This system
uses an integrated answer retrieval technique which combines of web crawler and Keyword oriented
procedure.
8. Sangdo Han et al.,[8] has worked on Keyword Question Answering System with Report Generation
for Linked Data. This paper introduces a Question answering system that extracts answer from
Linked Data and generate report in Natural Language.
This system uses entity disambiguation and distributed word similarity to match each keywords to
property in Linked Data. To extract Keyword related Information, this model uses SPARQL query.
This system returned the correct answer for 95% of the questions.
9. Sreelakshmi V et al.,[9] has worked on the Open domain Question answering System Using Semantic
Role Labeling. In this paper the system finds an answer using online search and Semantic Role
Labeling.
The goal of the Semantic Role Labeling is to identify all the constituents that fill a
semantic role i.e. to determine the roles like Agent , patient, Location, etc in a sentence. The result of

Department of Computer Science Page 9


Question & Answering System Using Natural Language Processing

this system was compared to that of the system using pattern matching to find the answers. This
system can be enhanced in future for Complex Questions.

10. Unmesh Sasikumar et al.,[10] has done “A Survey of Natural language Question Answering System”
. This survey paper describes the different method of natural language question answering system.
The different types of Question answering system described in this paper is as: i) Web based
Question answering system ii) Information Extraction based question answering system iii)Restricted
Domain Question answering system iv) Rule based question answering system v) Classification of
Questioners Level vi) Question answering system based on Information Retrieval .

2.2 Major Contributions:


Although the set of documents which are retrieved by the search engine contain a lot of
information about the search topic but it may or may not contain exactly that information which
the user is looking for The basic idea behind the question answering system is that the users just
have to enter the question and the system will retrieve the most appropriate and precise answer for
that question and return it to the user. Hence in those cases where the user is looking for a short
and precise answer, question answering System plays a great role rather than Search Engines,
which usually provide a large set of links of those web pages which might contain the answer of
that question.

Department of Computer Science Page 10


Question & Answering System Using Natural Language Processing

2.3. Comparative study of Prominent Works:

Table 2.1.Comparatively Study

S.No. Author Title Strength Weakness


Name
1. Jinzhong Xu et. Research of . New Hybrid Model
al. Automatic Question may overcome
answering system in Encryption and
network Teaching decryption time.
2. Viney Pal A Hybrid Data blowfish is For future work, key
Bansal et. al. Encryption unpatented, so this size should be
Technique using cryptosystem is increased to make the
RSA and Blowfish also cost efficient algorithm more secure
for Cloud
Computing on
FPGAs
3. Gurjeevan A Study of New This paper briefly In the new proposed
Singh et. al. Trends in Blowfish describes a new model of Blowfish by
Algorithm method to enhance further increasing the
the security of key length, Blowfish
Blowfish will provide the better
algorithm; this can results.
be possible by
replacing the pre-
defined XOR
operation by new
operation ‘#’

4. B.Thimma Cloud Security In this paper, we Inside the


Reddy et. al. using Blowfish and have now proposed Encryption/Decryption
Key Management answers for three provider approach
Encryption most trendy safety there is not any stored
Algorithm threats in cloud consumer data,
storage.

5. Parminder A New Advance Role Based Access Future Scope or


Singh et. al. Efficient RBAC to Control is an Role Based Access
Enhance the architecture which Control: Although
Security in provides the our proposed work
CloudComputing authority to restrict proposed a healthy
the user if he is not
mechanism for the
allowed to go on
with the content security of data but
still there is point of
number of

Department of Computer Science Page 11


Question & Answering System Using Natural Language Processing

transaction by one id
of one specific role
which could be the
loop hole of this
Architecture.

2.4 Summary
During the analysis and taking the experiment results of the system finds that data are more
accurate than the classical process so that it gives better results from the old process. In this a
wide variety of existing mechanism, algorithms and architectures is studied for identifying the
issues removed and remains in Question Answer area. Later on, this gives a brief categorization
of various approaches, which has been suggested over the last few years on Question Answer.

Department of Computer Science Page 12


Question & Answering System Using Natural Language Processing

CHAPTER 3
PROBLEM IDENTIFICATION
3.1 Problem Domain
In the last two decades dozens of question answering systems have been developed using some
new concepts and techniques. Hyo-Jung et al. [16] presented Chinese question classification
based on mining association rules, which extract word and bi-gram from questions as classic
features.

Kepei Zhang and Jieyupresented a Chinese Question• Answering System with Question
Classification, which uses word, named entity, part of speech (POS) and semantics as a classic
feature to classify the question.

Santosh Kumar Ray discussed some of the existing approaches for question classification and
proposed a new method based on the usage of the Word Net. Svetlana Stoyanchev [6] presented a
document retrieval experiment on a question answering system, and evaluated the use of named
entities and of noun, verb, and prepositional phrases as exact match phrases in a document
retrieval query, while Kangavari, Samira

Ghandchi, and Manak presented simplest approach to improve the accuracy of a question
answering system. ChiyoungSeoa, Sang-Won Leeb, and Hyoung-JooKima [20] showed in their
performance study that RDBMS implementation using inverted index technique almost always
outperforms the IR implementations.

Paloma Moreda Hector Llorens offered two proposals for using semantic information in QAS,
specifically in the answer extraction step. Its aim is to determine the improvement in performance
of current QA systems, especially when dealing with common noun questions.

Liang Yunjuan&Ma Lijuan discussed the design of dynamic knowledge-based full-text retrieval
system, inverted index technology research and analysis, given some of indexing code, in order to
improve the retrieval accuracy and to achieve a reasonable.

Many architecture have also been proposed for developing Question Answering Systems.

Department of Computer Science Page 13


Question & Answering System Using Natural Language Processing

Kolomiyets and Moneshave proposed a model based on the translation of question statement and
document into a computer readable format which is a little bit sophisticated and expensive.

Mohammad Reza Kangavari et al. have also proposed an architecture using dynamic patterns and
semantic relations among words verb and keywords. Both these architectures may perform well
but they are very complex, containing a large number of modules which is difficult to implement.

The usual Question Answering system is designed to answer simple wh-questions like “who”,
“what”, “when”, “where”, etc. Neverthless the recent QA research targets on extending the
system to answer complex questions, synopsis questions, view questions etc.The paper proposes a
Question Answering system that answers simple factoid, questions by using a technique called
Semantic Role Labeling.

Figure 3.1. Block Diagram of Existing Question Answering System

Department of Computer Science Page 14


Question & Answering System Using Natural Language Processing

3.2 Problem Identification


As early as 2002 a group of researchers3 wrote a roadmap of the research in the field of question
answering. They also recognized the issues associated to question answering. The subsequent
conversation is based on the issues they acknowledged during their research.

1. Question classes

2. Question processing

3. Context and QA

4. Data sources for QA

5. Answer extraction

6. Answer formulation

7. Real time question answering

8. Multilingual (or cross-lingual) question answering

9. Interactive QA 10. Advanced reasoning for QA

10. Information clustering for QA 12. User profiling for QA

3.2.1 Question Classes

A question may belong to dissimilar type and depending on its category we require different
strategies to answer the question. We all might have a range of attack for a category of factoid
questions, which on the other hand will not work for questions that need much deeper
understanding of facts. We need a profound understanding of what category a question belongs.

Example 1.3: Recall Question or Factoid Questions seek for Fact:

Question: Who is also known as “chacha”, and was born on 14th November?

However questions like:

Question: Why is sky blue in color?

Department of Computer Science Page 15


Question & Answering System Using Natural Language Processing

Requires understanding of not only facts but we must have knowledge of Scattering of Light.

Even aswill discuss later, questions can be classified depending on its form in its structure. We
discuss such categories in Chapter dedicated to bloom’s taxonomy.

3.2.2 Question Processing

Same question may be asked in different forms. We may ask it in interrogative way or the
assertive way. We need to understand the semantics of the question. We need to recognize what
the question is asking for before proceeding to answer the question itself. The practice to
understand the question is termed as Question Processing.

Example 1.4: We are seeking same information using different forms of the question:

Interrogative:

Question: What is the capital city of India?

Assertive:

Question: Tell me the name of city which is capital of India?

3.2.3 Context and QA

Questions are always asked in a context. Questions are rarely asked in universal context. We
could required to have a knowledge of context before proceeding to resolve a question.

Example 1.5: Question: Where is Taj?

For a person in America:

He is interested in finding the location of Taj Mahal in Agra.

For a person who just reached Mumbai: He is interested in finding Taj Hotel.

3.3.4 Data Sources and QA

Earlier than we can answer questions we need sources which are significant and exhaustive. We
all require a data source which will behave as the base for all the information required for

Department of Computer Science Page 16


Question & Answering System Using Natural Language Processing

addressing the questions. It may be collection documents. It can be the whole web which we can
search for. It can be a database from where we can get the answers for structured queries.

3.2.5 Answer Extraction

Depending on the question we may want to extract specific type of information from the data
sources. Reliant on the complexity of the question we would like to know the expectation of user.

Example 1.6: Extraction of a Name:

Question: Who is eldest brother among Pandavas?

Extraction of time or date:

Question: When was first battle of panipat fought?

Extraction of Place:

Question: Where was Gandhi born?

3.2.6 Answer Formulation

Simple extraction may be enough for certain questions. We may want the partial answers to be
extracted from various sources and combine them. At the same time, we want the results of the
QA System to be as natural as possible. For generating answers we require to generate answers
which are termed as answer formulation.

3.2.7 Real time question answering

We must answer questions, even the complex question must clarify in few seconds. Persons
would not like to wait for hours in front of computer to get answers to questions. Watson for
instance when played out Jeopardy was able to reply to in average of 3 seconds. We need to
develop architecture such that the end product is a real time system.

3.2.8 Cross lingual or Multilingual

QA Cross lingual QA or Multilingual QA is structured on seeking answers in sources apart from


the language the question was presented in. There are many information assets for English

Department of Computer Science Page 17


Question & Answering System Using Natural Language Processing

Question Answering machine to look for. But other languages like Hindi have a loss of such
resources. Therefore we translate a given query to more formative language and get the answer,
which is translated, back to original language.

3.2.9 Interactive QA

We do now not need a tedious query answering machine that just solutions questions that we pose
to it. The QA system to be interactive and clear doubt in case it finds the question ambiguous.

Example 1.8:

Person: Please tell me some place where I can eat?

Computer: Veg or Non Veg

Person: Veg will be fine.

3.2.10 Advanced reasoning For QA

We desire the QA scheme to not only reproduce what is there in the text gathering. We want it to
do more. We want it learn facts and apply reason to generate new facts which will be helpful in
answering the posed question.

Example 1.9: These are statements in the corpus:

Statement 1: Ravi is Mohan’s brother.

Statement 2: Brother of father is uncle.

Statement 3: Ravi is father of Saini.

Question: Who is uncle of Saini?

Can be answered by reasoning.

3.2.11 Information Clustering for QA

Department of Computer Science Page 18


Question & Answering System Using Natural Language Processing

We need to arrange data as per its sort so that the appropriate response look ends up noticeably
productive.After we know the category of question we will only search for the document
categories, which are relevant for our question.

3.2.12 User Profiling for QA

We would like to know intention of the user by means of analyzing his/her previous queries. For
accomplishing the task we need to build a profile of user. We would like to get the answers based
upon the taste of the user.

Example 1.10:

Question: What are stuffs to do in Spain?

For a Football fanatic:

Answer: “Real Madrid and Barcelona Match”.

For adventurist: Answer: “Bull Run”.

3.3 Existing System


The solution of Question Answering system works for a specific domain of tourism, which is a
global and routine activity for leisure. The users have to struggle to navigate through these
overloaded sites for a short piece of information of their interest. The crawler developed in the
system gathers web page information, which is processed using Natural Language Processing and
Procedure programming for a specific keyword.

Department of Computer Science Page 19


Question & Answering System Using Natural Language Processing

Figure 3.2. Work Flow Diagram of Existing Question Answering System

The question expansion emphasize on the comparison of multiple Tokens for a particular
keyword. The result shows that the expansions are feasible and efficient to find more similar
documents and sentences, as well as retrieving the answers from web resources to the user
questions.

A straightforward Question Answering structure was implemented using the technique called
Semantic Role Labeling. The system consisted of three phases called Query Processing,
Document Processing, Answer Processing. In each of these levels, numerous language processing
devices were used. The system is web based and therefore uses the search engine to extract
information from the web.

Department of Computer Science Page 20


Question & Answering System Using Natural Language Processing

Figure 3.3. Existing System Work Flow Diagram

A basic Question Answering framework was actualized utilizing the method called Semantic Role
Labeling. The system consisted of three phases called Query Processing, Document Processing,
and Answer Processing. In each of these phases, several language-processing components were
used. The system is web based and hence uses the search engine to extract information from the
web.

3.4 Summary:
In this chapter we identify problem in existing system. Later on, this will give a brief
categorization of various approaches, which has been suggested over the last few years on
Question Answering System using Data mining approaches.

Department of Computer Science Page 21


Question & Answering System Using Natural Language Processing

CHAPTER 4
PROPOSED WORK

4.1 Proposed Work


The architecture of the proposed system is explained below. The Architecture consists of various
steps.

1. Query Section:- The User enter a Question in the section.


2. Preprocessing:- In the step the question entered by the user undergoes three methods
i. Tokenization- Here the question entered is converted into tokens or single words
ii. Stop word removal- All the stop words such as is, am are, etc., are removed in this
process.
iii. Stemming-Stemming refers to the reducing of the word to its root by filtering out prefix
and suffix of the word.
3. Token Identification: - The Next step after preprocessing is Token Identification. This is
the important step for answer extraction where tokens present in the question are identified
for an efficient answer extraction process.
4. Question Analysis :- This phase is broadly divided into three categories :
i. Definition Type :- Definition Type of question requires one or two sentence as an
answer
ii. Descriptive Type :-Descriptive Type of question requires few set of sentences or a
paragraph as an answer
iii. Factoid Type: - Factoid Type of questions require one or two word answer. For ex.
Why, How and Explain Question are asked for descriptive type of answers. Who,
When, Where, What, Which are generally asked for Factoid type of answers here who
signifies the name of a person, When signifies Time/Date, Where signifies
Place/Location.
5. Head word Selection: - After the Question Analysis and Token generation the next phase
is Head word Generation. Here the Tokens which are generated in third phase (Token

Department of Computer Science Page 22


Question & Answering System Using Natural Language Processing

Identification) are chosen as a Head Word. This Head word will be useful in the next
phase of Clustering.
6. Clustering: - Here Clustering technique is applied on Wikipedia data set for answer
extraction process. Clustering is the task of grouping a set of objects in such a way that
objects in the same group are more similar to each other than to those in other group. The
Clustering Technique used in this paper is K-Means. K-means Clustering is a method of
Vector Quantization. It aims to partition n Observations into k clusters in which each
observation belongs to the cluster with the nearest mean.
The Head Word Selected in the above phase is used in this phase to form clusters. To
apply K-Means on the data set TF-IDF has been used.Three clusters are formed using K-
means algorithm, from where and the answers will be retrieved.
7. Templates Matching :- . Templates are the predefined format of the answer which will be
presentedin front of the user. Templates are formed by the Head word , selected in the 5th
phase and by the Question Format decided in the Question analysis Phase.
The Answer Extraction process will match the following Templates in its Database to give
an exact answer. Templates such as:
i. “Head Word” is . . . . . . .
ii. “Head Word” means . . . . . . .
iii. “Head Word” is known as . . . . . .
iv. “Head Word” is called. . . . . . . .
v. “Head Word” can be defined as . . . . . . .
8. Answer Extraction: - In this Phase the templates generated in the above phase will be
matched to that in the Clusters formed in the clustering phase.

4.1.1 Proposed Architecture

Proposed System show in below figure 4.1.Proposed system performs following operations:

1. Question Processing: In this module the given Question is processed to get some important
information from it. Steps through which question Processing Module passes and their
descriptions are given below. Steps in Question Processing Module:
a. Find the Type of given question using Wh word.
b. Find out the expected type of answer.
c. Get the Keywords from the Question.

Department of Computer Science Page 23


Question & Answering System Using Natural Language Processing

d. Find out the Focus of the question.

he first step in the QA System is the Question Processing or Question Classification module.
Various information, which we will get through this module, are the Type of Question, Expected
Answer Type, Focus or Head Word of the Question and the Question Keywords.

Table 4.1 Identification of Question


Question Type

WH word Factoid Type Definition Type Descriptive Type

Question Who How What


When What
What Why
Where
Which

2. Document Processing: Once the question has been processed we will move towards the
document processing module. In this module the documents which are relevant to the given
question are retrieved and processed. Following steps used in document processing.
a. Get the question in hand and search relevant documents using a reliable search
engine.
b. Take top relevant documents.
c. Extract the content from these documents.
d. Save these contents in to file
3. Answer processing: This module presents algorithms for extracting the potential answer
for all the three categories of questions that is Definition Type of Question, Descriptive
Type of Question and Factoid Type of Question.
4. Dataset Clustering: cluster dataset using fuzzy c-mean algorithm then process for
question and answer processing.

The architecture of the proposed system is shown below. The question asked by the user
goes through different stages from preprocessing to Question identification, and then from
clustering the data sets and finally template matching is performed to get the required
answer.

Department of Computer Science Page 24


Question & Answering System Using Natural Language Processing

Figure 4.1: Proposed System Architecture

Below shown is the flow chart of the proposed system. The flow chart describes the way in which
user gets an answer for its Query. If the question entered by the user is not valid, then the system
returns an error. If the question is valid it performs the steps shown to get a exact answer.

Department of Computer Science Page 25


Question & Answering System Using Natural Language Processing

Figure 4.2: Flow Chart of Proposed System

4.2 Recommendation Algorithm

Algorithm Question_Answering (question){


stopword[] //string type array
//delimiter ="";
tokens[] = split_string(question,del);
for i=1 to tokens.length {
for j=i to stopword.length
if(tokens[i]==stopword[j])
remove tokens[i]
}
for i=1 to tokens.length {
stemming (tokens[i])

Department of Computer Science Page 26


Question & Answering System Using Natural Language Processing

type_of_question=questionIdentification();
//generate template
print question type
//select HeadWord from tokens;
K-mean(wikipedia_database)
ans[]= search wikipedia (HeadWord);
for i=1 to ans.length
matchTemplate(template, ans[i])
add to answer
return answer;
}

4.3 Implementation Details


Software Requirement
Operating System (Windows or Linux): The web based execution support several operating
system having browsers. For client side OS is supported but for server side Server edition is
required.

Netbeans 8.0.1

Fast & Smart Code Editing-An IDE is much more than a text editor. The Net Beans Editor
indents lines, matches text and brackets, and best parts source code syntactically and
semantically. What's more, it provides code templates, coding tips, and refactoring methods. The
editor supports many languages PHP, Java, C/C++, HTML, Groovy, JavaScript and JSP and
Servlet. Since the editor is extensible, many plugin are available for many other languages.

Easy & Efficient Project Management

Keeping an obvious overview regarding large software, with many folders as well as files, and
millions of lines regarding code, can be a daunting process. NetBeans IDE delivers different
views of one's data, from many project glass windows to helpful tools for putting together your

Department of Computer Science Page 27


Question & Answering System Using Natural Language Processing

software and handling them successfully, letting a person drill into your data efficiently, while
providing you version instruments via Subversion, Mercurial, and Git integration out from the
box.

Rapid User Interface Development

Design GUIs for Java SE, Java EE,HTML, PHP and Java ME applications quickly and smoothly
by using editors and drag-and-drop tools in the IDE. Regarding Java SE applications, the
NetBeans GUI Builder automatically manages correct spacing and alignment, whilst supporting
in-place enhancing, as effectively. The GUI builder is so convenient to use and spontaneous that
many experts have used for you to prototype GUIs dwell at buyer presentations.

Bug Free Code

The expense of buggy code enhances the longer the item remains unfixed. NetBeans gives static
research tools, especially integration using the widely used FindBugs software, for determining
and fixing common problems in Coffee code. Additionally, the NetBeans Debugger lets you place
breakpoints in your source signal, add discipline watches, stage through your own code, come
upon methods, take pic and keep an eye on execution since it occurs. NetBeans Profiler delivers
expert support for optimizing your application's pace and ram usage, and helps it be easier to
develop reliable and also scalable Java SE, JavaFX and also Java EE applications. NetBeans IDE
has a visual debugger with regard to Java SE applications, permitting you to debug person
interfaces without researching source signal. Take GUI snapshots of your applications and select
user software elements for you to jump on the similar source signal.

Java

 JAVA is easy to program- Java was designed to be easy to use and is therefore easy to
write, compile, debug, and learn than other programming languages.
 Java is pure object oriented language.
 Java is platform-independent-One of the most significant advantages of Java is its ability
to move easily from one computer system to another. The ability to run the program on

Department of Computer Science Page 28


Question & Answering System Using Natural Language Processing

many different operating systems is crucial to WWW, and Java succeeds at this by being
platform-independent at both the source and binary levels.

JDK 8

JDK 8 namely Java Development Kit 8 is the latest version of Java and JDK 8 is released as part
of Java Standard Edition.

Hardware Requirement (Minimum)

(i) Client Requirement


a. 2.0 GHz Processor
b. 2 GB RAM
c. 25 GB hard disk
(ii) Server Requirement
a. 2.0 GHz Multicore Processor
b. 5 GB RAM DDR 3 Dual Slot
c. 25-50 GB Single or Double Swap Hard Disk

4.4Tools Used
Considering the each feature of real implementation and checking the results of existing
approaches some improvements has to be performed in later stages of this work. It requires some
tools, which facilitates the development and proves the results as required. The category is
divided into two basic areas:

Implementation supporting tools;

 JDK 1.7
 NetBeans 8.0

Department of Computer Science Page 29


Question & Answering System Using Natural Language Processing

4.5 Implemented and References classes

The implementation of the proposed concept of machine learning based weather prediction
system need to incorporate the following third party class libraries for supporting the entire
implementation.

Table 4.2 Reference Classes

S. Classes Description
No.
1 java. util.ArrayList The array list provide a high level data structure
that consumes data to intermediately store and
use during program execution
2 java.sql Provides the API for retrieving and processing
data kept in a data source (usually a relational
database) using the Java TM programming
language.
3 javax.swing.filechooser That is swing GUI used to select a file and
directory for user input

4 java.io.file Input stream An abstract representation of file and directory


path names.

5 org.jfree.chart. Chart Provide the classes and methods to implement


Factory the chart and performance graph

6 org.jfree.chart.plot.XYPlot The class provide the method to provide the


input dataset as X and Y coordinate and show
over the graphical plane

7 java.io.File This class offerings an abstract, system-


independent view of hierarchical pathnames.

4.5.1 Implemented classes

The given section provides the description of classes that are implemented for developing the
components of the proposed working model.

Department of Computer Science Page 30


Question & Answering System Using Natural Language Processing

Table 4.3 Implemented Classes

S. No. Classes Description


1 Tokenizes The given class implements the different
functions and methods for the String Tokenize
Model
2 Question Analysis This class is a used for analysis of question and
return type of question.
3 K-Mean The class contains the methods and functions
that are used to clustering
4 Stop Words Remover This class is a used for remove stop words from
question.
5 Connect The class contains a method which is used to
make connectivity among the frontend design
and the backend data base
6 Graph This class contains the implementation of the J
Free Chart library implementation for
visualizing the performance graph of the system

4.6 GUI Design

This section provides the details of the implemented user interface design and their navigational
model for user working. The figure 4.3 shows the initial project screen.

Figure 4.3: Project Screen

Department of Computer Science Page 31


Question & Answering System Using Natural Language Processing

4.7 Summary

After studying the different existing mechanism this identifies the System Preliminary. It gives a
clear understanding the Algorithm with its steps. It will help the solution to provide better
resolution of the current situations of security.This chapter also gives implementation plan and
Testing Strategy of above security problems by suggesting an architectural solution. Here in this
chapter the implementation of our proposed system will be done. The implementation is working
on which platform, what kind of theme and approach is followed is referred in this section.

Department of Computer Science Page 32


Question & Answering System Using Natural Language Processing

CHAPTER 5
RESULT ANALYSIS
5.1 Result Analysis
Question and Answering System is developed in this research with help Java (JDK1.8) and Net
Beans IDE8.02 on window operating sytem7. All forms of Question Answering System design in
Swing. Graph plotted for computation time, type of question and memory management using
JFree Chart Library. In Result Analysis compare Proposed Question Answering system with
existing Question Answering system in term of computation time and memory.

In Question Answering System took each type of questions for experiment like Factoid Question,
Descriptive and Definition. Wikipedia used as dataset for search Question answers. Below figure
shows that home screen of project.

Figure 5.1 Home screen of project.

Department of Computer Science Page 33


Question & Answering System Using Natural Language Processing

5.1.1 Evolution Parameters


In Question Answering system focus on following parameters
 Question type
 Computation Time
 Memory Management

5.1.1.1 Question Types


Find type of Question corresponding to Enter Question for Answer. Using type of
question design template that helps to find more accurate answer for given entered
Question.

Figure 5.2 Question Types

Table 5.1 Number of Question in Types

S.No. Question Type No. Of Questions

1 Definition Type 6

2 Description Type 2

3 Factoid Type 2

Department of Computer Science Page 34


Question & Answering System Using Natural Language Processing

Table 5.2 Question Type of each Question


S.No. Question Number Question Type
1 Question Number 1 Description
2 Question Number 2 Definition
3 Question Number 3 Definition
4 Question Number 4 Definition
5 Question Number 5 Description
6 Question Number 6 Definition
7 Question Number 7 Factoid
8 Question Number 8 Description
9 Question Number 9 Factoid
10 Question Number 10 Definition

5.1.1.2 Computation Time

We calculate computation time for Exiting Question Answering system and Proposed
Question Answering system. And results shown with help of graph. From experiments
found that Proposed Question Answering system less computation time compare to
Existing Question Answering system.

Department of Computer Science Page 35


Question & Answering System Using Natural Language Processing

Figure 5.3 Computation time for Existing and Proposed System.


Table 5.3 Computation time for Existing and Proposed System.
Computation Time of Computation Time of
Existing Question Proposed Question
S.No. Question Number
Answering System Answering System
(MS) (MS)
1 Question Number 1 13203 7215

2 Question Number 2 9853 4978

3 Question Number 3 10340 9734

4 Question Number 4 22565 5123

5 Question Number 5 11287 7460

5.1.1.3 Computation Memory

We calculate computation memory for Exiting Question Answering system and Proposed
Question Answering system. And results shown with help of graph. From experiments
found that Proposed Question Answering system less computation memory compare to
Existing Question Answering system.

Figure 5.3 Computation Memories for Existing and Proposed System.

Department of Computer Science Page 36


Question & Answering System Using Natural Language Processing

Table 5.4 Computation memory for Existing and Proposed System


Computation Memory Computation Memory of
of Existing Question Proposed Question
S.No. Question Number
Answering System Answering System
(MB) (MB)
1 Question Number 1 2183 1940

2 Question Number 2 2434 2015

3 Question Number 3 2507 2473

4 Question Number 4 2164 1946

5 Question Number 5 1798 1816

5.2 Output Screen

Figure 5.5 Initial screens for Existing and Proposed System.

Department of Computer Science Page 37


Question & Answering System Using Natural Language Processing

Figure 5.6 Existing Question Answering System

Figure 5.7 Proposed Question Answering System

Department of Computer Science Page 38


Question & Answering System Using Natural Language Processing

Figure 5.8 Select Head Word of Question

Figure 5.6 Answer of Proposed Question Answering System

Department of Computer Science Page 39


Question & Answering System Using Natural Language Processing

5.3 Summary

Developing a solution is an approach proving mechanism but to prove its results is a complicated
task because it measures each and every step of the solution and let it compare with the existing
mechanisms. So as to do that effectively this chapter gives a detailed result analysis to prove
effectiveness of the suggested mechanism.

Department of Computer Science Page 40


Question & Answering System Using Natural Language Processing

CHAPTER 6
CONCLUSION & FUTURE WORK
6.1 Conclusion
In this thesis we have proposed a structure for space question Answering System utilizing Data
mining devices and programming. This structure can be utilized to build up a Question Answering
System for removing definite and exact answer from limited area printed informational collection.
The proposed structure not just gives a straightforward and implementable structure for creating
question Answering System yet in addition gives an appropriate stream of information for answer
extraction. Since the proposed show works over watchwords and headword and is autonomous of
the inquiry or sentence structure, it has decreased the overhead of question standardization.

6.2 Future Work

Besides since the structure is given for confined area, it likewise handles the issue of word sense
disambiguation. The significant issue, which exists with the proposed structure, is that its
execution is reliant on the execution of the web crawler and the utilized Data mining apparatuses.

Department of Computer Science Page 41


Question & Answering System Using Natural Language Processing

REFERENCES
[1]. Sreelakshmi V, Sangeetha Jamal, Survey Paper : Question Answering Systems, in National Conference on
Computing and Communication - (NCCC ), March 2014, GEC Idukki.

[2]. M Ramprasath, S Hariharan Improved Question Answering System by semantic reformulation, IEEE- Fourth
International Conference on Advanced Computing, 2012.

[3]. Ali Mohamed Nabil Allam, and Mohamed Hassan Haggag, The Question Answering Systems: A Survey,
International Journal of Research and Reviews in Information Sciences (IJRRIS), September 2012 Science Academy
Publisher, United Kingdom

[4] Molla D., and Vicedo J., "Question answering in restricted domains: An overview", Computer Linguist, ppAI-6 1,
2007

[5] Moreda P., Llorens H., Saquete E., & Palomar M., "Combining semantic information in question answering
systems", Information Processing & Management, pp.870-885, 20 1 1.

[6] Svetlana Stoyanchev, and Young Chol Song, and William Lahti, "Exact Phrases in Information Retrieval for
Question Answering", Coling 2008: Proceedings of the 2nd workshop on Information Retrieval for Question
Answering (IR4QA), pp. 9- 16 Manchester,UK. August 2008".

[7] Woods W.A, Kaplan R.A, Nash-Webber.B, "The lunar sciences naturallanguage information system" , Final
report: BBN Report #2378. Technical report, Bolt Beranek and Newman Inc.,Cambridge, MA., June 1972.

[8] Green RF, Wolf A.K., Chomsky, K. Laughery, "BASEBALL: An automatic question answerer", in: Proceedings
of Western Computing Conference, vol.19, pp. 2 19-224, 196 1.

[9] Ittycheriah A, Franz M, Zhu WJ, Ratnaparkhi A and Mammone RJ. IBM‟s statistical question answering system.
In Proceedings of the Text Retrieval Conference TREC-9, 2000.

[10] Athira P. M., Sreeja M. and P. C. Reghuraj 
Department of Computer Science and Engineering, Government
Engineering College, Sreekrishnapuram, Palakkad, Kerala, India, 678633. Architecture of an Ontology-Based
Domain-Specific Natural Language Question Answering System. 


[11] Pragisha K. “design and implementation of a QA system in Malayalam”. 


[12] Sanjay K Dwivedi, Vaishali Singh. Research and reviews in question answering system Department of
Computer Science, B. B. A. University (A Central University) Luck now, Uttar Pradesh, 226025, India. 


[13] Poonam Gupta, Vishal Gupta 
Assistant Professor, Computer Science & Engineering Department University
Institute of Engineering & Technology Panjab University, Chandigarh. 


[14] Kolomiyets, Oleksander. And Moens, Marie-Francine. “A survey on question answering technology from an
information retrieval perspective”. Journal of Information Sciences 181, 2011.5412-5434. DOI:
10.1016/j.ins.2011.07.047. Elsevier. 


[15] Moreda, Paloma., Llorens Hector., Saquete, Estela. And Palomar, Manuel. “Combining semantic information in
question answering systems” Journal of Information Processing and Management 47, 2011. 870- 885. DOI:

Department of Computer Science Page 42


Question & Answering System Using Natural Language Processing

10.1016/j.ipm.2010.03.008. Elsevier. 


[16] Ko, Jeongwoo., Si, Luo., and Nyberg Eric. “Combining evidence with a probabilistic framework for answer
ranking and answer merging in question answering” Journal: Information Processing and Management 46, 
2010
541-554. DOI: 10.1016/j.ipm.2009.11.004. Elsevier.

[17] Pachpind Priyanka P, Bornare Harshita N, Kshirsagar Rutumbhara B, Malve Ashish D BE Comp S.N.D COE &
RC, YEOLA,” An Automatic Answering System Using Template Matching For Natural Language Questions”.

Department of Computer Science Page 43


Question & Answering System Using Natural Language Processing

PLAGIARISM
Self Certificate
This is to certify that on “Question & Answering System Using
Natural Language Processing” for the partial fulfillment of M.Tech
(C.S) YDC program is original and well checked by plagiarism detector
software (http://smallseotools.com/plagiarism-checker).

Chapter wise detail as follows:

Chapter % of Unique
Number Plagiarism Checked In Content

1 Introduction 100 %
2 Literature Review 93 %
3 Problem Identification 89%
4 Proposed Work 95 %
5 Result Analysis 100%
6 Conclusion & Future work 100%

Nilam Chourasiya Dipshikha Sharma Dr. Suresh Batni


(0837CS20MT03) Supervisor Principal
SIMS, Indore SIMS, Indore

Department of Computer Science Page 1


https://Plagiarism-Detector.com
Plagiarism Detector v. 1092 - Originality Report:

Analyzed document: 19/02/20223 1:30:45 PM

"NC.docx"
Licensed to: Sachin Choudhary
Relation chart:

Distribution graph:

Comparison Preset: Word-to-Word. Detected language: English

% 0.9 wrds: 16

% 0.5 wrds: 9

% 0.4 wrds: 7 https://www.ijcsmc.com/docs/papers/February2018/V7I2201812.pdf

[Show other Sources:]

Processed resources details:

347 - Ok / 42 - Failed

[Show other Sources:]


International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

“Survey on Question and Answering Retrieval System Using Hadoop”


Nilam Chourasiya

Mtech Scholar

Department of Computer Science


Sanghvi Institute of Management & Science,Indore

Abstract:
Question and Answering System is one of the major module and Answer Processing module. Each Processing
research are in Natural Language. Main challenges of and Information Retrieval module contains several sub
Question and Answer system gives exact answer of modules and these modules use several Natural Language
question which give by user. Question and Answering Processing Techniques in order to extract the proper
system can be classified into three category are open answer. The usual Question Answering system is designed
domain, closed domain and restricted domain. Using to answer simple wh-questions like “who”, “what”,
advanced Natural Language Processing tool we will be “when”, “where”, etc. But the recent QA research focuses
developed a framework for question answering system. on extending the system to answer complex questions,
In this paper we work on restricted domain question summary questions, opinion questions etc. The paper
answering system. Proposed system work on keyword and proposes a Question Answering system that answers
question matching and return precise answer of simple factoid, wh-questions by using a technique called
question. Semantic Role Labeling.

Keywords:-Natural Language processing ,information


retrieval , semantic similarity , restricted domain , answer
extraction, answer ranking

1. INTRODUCTION:
Although the set of documents which are retrieved by the
search engine contain a lot of information about the search
topic but it may or may not contain exactly that
information which the user is looking for [1].The basic
idea behind the question answering system is that the users Figure 1. Block Diagram Question Answering System
just have to enter the question and the system will retrieve
the most appropriate and precise answer for that question The rest of the paper is organized as follows. The next
and return it to the user. Hence in those cases where the section describes the general architecture of a Question
user is looking for a short and precise answer, question Answering System. Section 3 discusses some of the
answering System plays a great role rather than Search related works in this area. The proposed system
Engines, which usually provide a large set of links of those architecture is described in section 4. The paper concludes
web pages which might contain the answer of that with the experimental setup and results.
question. A typical Question Answering system can be
divided into 3 modules namely: Question Processing 2. ARCHITECTURE OF A QUESTION
module, Document Processing or Information Retrieval ANSWERING

In this section we describe the architecture of our system.


The overall architecture of the system can be subdivided
into three main modules: (1) pre-processing, (2) question
template matching, and (3) answering. Each module is

1
International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

described in detail in the following subsections. Another system developed by Pragisha K. Et.al [11],
described about the. It receives Malayalam natural
Question Answering Systems can be classified on the basis language questions from the user and extracts most
of the domains over which it has been constructed. appropriate response by analyzing a collection of
Malayalam documents. The system handles four each
 Open Domain Question Answering question.
 Close Domain Question Answering Research and reviews in question answering system
 Restricted Domain Question Answering developed by Sanjay K Dwivedi Et.al[12] propose
taxonomy for characterizing Question Answer (QA)
Open domain question answering systems are domain systems, survey of major QA systems described in
independent. It relies on general ontology and world literature and provide a qualitative analysis of them.
knowledge. Usually these systems have a large collection
of data from where the required answer is to be found out. Table [I] presents comparison about different types of
Since in case of Open Domain question answering question answering system [22].
information content is not of particular domain it can
answer questions of various fields however here deep S. No Type of Question Question and
reasoning is not possible [3]. and Answering Answering System
System Methods
Close domain question answering systems deal with 1 Multilingual Tokenization and
questions in a specific domain [3]. LUNAR and Question/Answering pos tagging., Word
BASEBALL are the example of close domain QA systems sense
.In this case the data set contains a very limited amount of disambiguation,
focused and structured information . hence in case of close Answer type
domain question answering systems deep reasoning is identification,
possible but the problem with these systems was that due Keywords
to the very small size of data set they are not more than a expansion,
'Toy Systems"[4]. Semantic
Disambiguation
Research in restricted-domain question answering 2 Analysis of the Query
(RDQA) addresses problems related to the incorporation Asks Question- Reformulation,
of domain- specific information into current state-of-the- Answering System NGram Mining, N-
art QA technology with the hope of achieving deep Gram Filtering, N-
reasoning capabilities and reliable accuracy performance Gram Tiling.
in real world applications. In fact, as a not too-long-term 3 Multilingnality, Answering
vision, Spatial- temporal architecture
context awareness,
3. LITERATURE SURVEY: Textual entailment
4 A Question Expected Answer
In most of the research papers [4, 5, 6] LUNAR [7] and Answering System Type, Named
BASEBALL [8] have been discussed as the earlier based on Entities Presence,
developed question answering systems. However there are Information
various question answering systems which have been Retrieval and
developed with different concepts since the idea of QA Validation
System has been coined 5 A Hybrid Question Module,
Answering System Hypothesis
In a system developed Athira P. M, Et.al [10], presented based on Generation
an architecture of ontology-based domain-specific natural Information Module, Document
language question answering that applies semantics and Retrieval and Processing and
domain knowledge to improve both query construction and Answer Validation Indexing
answer extraction. 6 A specifiable Answering

2
International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

domain multilingual architecture and their descriptions are given below. Steps
Question in Question Processing Module:
In a System developed by Poonam Gupta Et.al [13] A a. 
Find the Type of given
Survey of Text Question Answering Techniques. Question
answering is a difficult form of information retrieval question using Wh word.
characterized by information needs that are at least b. Find out the expected type of
somewhat expressed as natural language Template answer.
Matching Automatic Answering System For natural c. Get the Keywords from the
languages questions proposed by Pachpind Priyanka Et.al Question. 

[17], Frequently Asked QA System that replies with pre-
stored answers to user questions asked in regular English, d. Find out the Focus of the
rather than keyword or sentence structure based retrieval question. 

mechanisms.
he first step in the QA System is the Question Processing
4. Proposed System: or Question Classification module. Various information,
which we will get through this module, are the Type of
Since both the Open Domain QA System and Close Question, Expected Answer Type, Focus or Head Word of
Domain QA System have their own pros and cons a new the Question and the Question Keywords.
concept of Question Answering has been coined by
Molla& Vice do [4] called RESTRICTED DOMAIN QA Question Type
SYSTEM, which is the midway of these two domains.
WH Factoid Definition Descriptive
We are convinced that research in restricted domains will word Type Type Type
drive the convergence between structured knowledge-
based and free text-based question answering. Question Who How What
When What
What Why
Where
Which

2. Document Processing: Once the question has


been processed we will move towards the
document processing module. In this module
the documents which are relevant to the given
question are retrieved and processed.
Following steps used in document processing.
a. Get the question in hand and
search relevant documents using a
reliable search engine.
b. Take top relevant documents.
Figure 2.
Proposed System
c. Extract the content from these
documents.
Our proposed system performs following operations: d. Save these contents in to file
3. Answer processing: This module
1. Question Processing: In this module the presents algorithms for extracting the
given Question is processed to get some potential answer for all the three
important information from it. Steps through categories of questions that is Definition
which question Processing Module passes

3
International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

Type of Question, Descriptive Type of [6] Svetlana Stoyanchev, and Young Chol Song, and
Question and Factoid Type of Question. William Lahti, "Exact Phrases in Information Retrieval for
Question Answering", Coling 2008: Proceedings of the
4. Dataset Clustering: cluster dataset using 2nd workshop on Information Retrieval for Question
fuzzy c-mean algorithm then process for Answering (IR4QA), pp. 9- 16 Manchester,UK. August
question and answer processing. 2008".
[7] Woods W.A, Kaplan R.A, Nash-Webber.B, "The lunar
5. CONCLUSION: sciences naturallanguage information system" , Final
report: BBN Report #2378. Technical report, Bolt Beranek
In this paper we have proposed a framework for restricted and Newman Inc.,Cambridge, MA., June 1972.
domain question Answering System using advanced NLP [8] Green RF, Wolf A.K., Chomsky, K. Laughery,
tools and software. This framework can be used to develop "BASEBALL: An automatic question answerer", in:
a Question Answering System for extracting exact and Proceedings of Western Computing Conference, vol.19,
precise answer from restricted domain textual data set. The pp. 2 19-224, 196 1.
proposed framework not only provides a simple and [9] Ittycheriah A, Franz M, Zhu WJ, Ratnaparkhi A and
implementable framework for developing question Mammone RJ. IBM‟s statistical question answering
Answering System but also provides a proper flow of data system. In Proceedings of the Text Retrieval Conference
for answer extraction. TREC-9, 2000.
[10] Athira P. M., Sreeja M. and P. C. Reghuraj
Since the proposed model works over keywords and 
Department of Computer Science and Engineering,
headword and is independent of the question or sentence Government Engineering College, Sreekrishnapuram,
structure, it has reduced the overhead of question Palakkad, Kerala, India, 678633. Architecture of an
normalization. Moreover since the framework is given for Ontology-Based Domain-Specific Natural Language
restricted domain, it also handles the issue of word sense Question Answering System. 

disambiguation. The major problem which exists with the [11] Pragisha K. “design and implementation of a QA
proposed framework is that it's performance is dependent system in Malayalam”. 

on the performance of the search engine and the used NLP [12] Sanjay K Dwivedi, Vaishali Singh. Research and
tools. reviews in question answering system Department of
Computer Science, B. B. A. University (A Central
6. REFERENCES: University) Luck now, Uttar Pradesh, 226025, India. 

[13] Poonam Gupta, Vishal Gupta 
Assistant Professor,
[1]. Sreelakshmi V, Sangeetha Jamal, Survey Paper : Computer Science & Engineering Department University
Question Answering Systems, in National Conference on Institute of Engineering & Technology Panjab University,
Computing and Communication - (NCCC ), March 2014, Chandigarh. 

GEC Idukki. [14] Kolomiyets, Oleksander. And Moens, Marie-
[2]. M Ramprasath, S Hariharan Improved Question Francine. “A survey on question answering technology
Answering System by semantic reformulation, IEEE- from an information retrieval perspective”. Journal of
Fourth International Conference on Advanced Computing, Information Sciences 181, 2011.5412-5434. DOI:
2012. 10.1016/j.ins.2011.07.047. Elsevier. 

[3]. Ali Mohamed Nabil Allam, and Mohamed Hassan [15] Moreda, Paloma., Llorens Hector., Saquete, Estela.
Haggag, The Question Answering Systems: A Survey, And Palomar, Manuel.“Combining semantic information
International Journal of Research and Reviews in in question answering systems” Journal of Information
Information Sciences (IJRRIS), September 2012 Science Processing and Management 47, 2011. 870- 885. DOI:
Academy Publisher, United Kingdom 10.1016/j.ipm.2010.03.008. Elsevier. 

[4] Molla D., and Vicedo J., "Question answering in [16] Ko, Jeongwoo., Si, Luo., and Nyberg Eric.
restricted domains: An overview", Computer Linguist, “Combining evidence with a probabilistic framework for
ppAI-6 1, 2007 answer ranking and answer merging in question
[5] Moreda P., Llorens H., Saquete E., & Palomar M., answering” Journal: Information Processing and
"Combining semantic information in question answering Management 46, 
2010 541-554. DOI:
systems", Information Processing & Management, pp.870- 10.1016/j.ipm.2009.11.004. Elsevier.
885, 20 1 1. [17] Pachpind Priyanka P, BornareHarshita N,

4
International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

KshirsagarRutumbhara B, Malve Ashish D BE Comp Answering System. Section 3 discusses some of the
S.N.D COE & RC, YEOLA,” An Automatic Answering related works in this area. The proposed system
System Using Template Matching For Natural Language architecture is described in section 4. The paper concludes
Questions”. with the experimental setup and results.
1.
I Introduction
II Architecture of a Question Answering
Although the set of documents which are retrieved by the
search engine contain a lot of information about the search In this section we describe the architecture of our system.
topic but it may or may not contain exactly that The overall architecture of the system can be subdivided
information which the user is looking for [1].The basic into three main modules: (1) pre-processing, (2) question
idea behind the question answering system is that the users template matching, and (3) answering. Each module is
just have to enter the question and the system will retrieve described in detail in the following subsections.
the most appropriate and precise answer for that question
and return it to the user. Hence in those cases where the Question Answering Systems can be classified on the basis
user is looking for a short and precise answer, question of the domains over which it has been constructed.
answering System plays a great role rather than Search
Engines, which usually provide a large set of links of those • Open Domain Question Answering
web pages which might contain the answer of that
question. A typical Question Answering system can be • Close Domain Question Answering
divided into 3 modules namely: Question Processing
module, Document Processing or Information Retrieval • Restricted Domain Question Answering

Open domain question answering systems are domain


independent. It relies on general ontology and world
module and Answer Processing module. Each Processing knowledge. Usually these systems have a large collection
and Information Retrieval module contains several sub of data from where the required answer is to be found out.
modules and these modules use several Natural Language Since in case of Open Domain question answering
Processing Techniques in order to extract the proper information content is not of particular domain it can
answer. The usual Question Answering system is designed answer questions of various fields however here deep
to answer simple wh-questions like “who”, “what”, reasoning is not possible [3].
“when”, “where”, etc. But the recent QA research focuses
on extending the system to answer complex questions, Close domain question answering systems deal with
summary questions, opinion questions etc. The paper questions in a specific domain [3]. LUNAR and
proposes a Question Answering system that answers BASEBALL are the example of close domain QA systems
simple factoid, wh-questions by using a technique called .In this case the data set contains a very limited amount of
Semantic Role Labeling. focused and structured information . hence in case of close
domain question answering systems deep reasoning is
possible but the problem with these systems was that due
to the very small size of data set they are not more than a
'Toy Systems"[4].

Research in restricted-domain question answering


(RDQA) addresses problems related to the incorporation
of domain- specific information into current state-of-the-
art QA technology with the hope of achieving deep
reasoning capabilities and reliable accuracy performance
Figure 1. Block Diagram Question Answering System in real world applications. In fact, as a not too-long-term
vision.
The rest of the paper is organized as follows. The next
section describes the general architecture of a Question

5
International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

III Literature Survey Answering System Type, Named


based on Entities Presence,
In most of the research papers [4, 5, 6] LUNAR [7] and Information
BASEBALL [8] have been discussed as the earlier Retrieval and
developed question answering systems. However there are Validation
various question answering systems which have been 5 A Hybrid Question Module,
developed with different concepts since the idea of QA Answering System Hypothesis
System has been coined based on Generation
Information Module, Document
In a system developed Athira P. M, Et.al [10], presented Retrieval and Processing and
an architecture of ontology-based domain-specific natural Answer Validation Indexing
language question answering that applies semantics and 6 A specifiable Answering
domain knowledge to improve both query construction and domain multilingual architecture
answer extraction. Question

Another system developed by Pragisha K. Et.al [11], In a System developed by Poonam Gupta Et.al [13] A
described about the. It receives Malayalam natural Survey of Text Question Answering Techniques. Question
language questions from the user and extracts most answering is a difficult form of information retrieval
appropriate response by analyzing a collection of characterized by information needs that are at least
Malayalam documents. The system handles four each somewhat expressed as natural language Template
question. Matching Automatic Answering System For natural
languages questions proposed by Pachpind Priyanka Et.al
Research and reviews in question answering system [17], Frequently Asked QA System that replies with pre-
developed by Sanjay K Dwivedi Et.al[12] propose stored answers to user questions asked in regular English,
taxonomy for characterizing Question Answer (QA) rather than keyword or sentence structure based retrieval
systems, survey of major QA systems described in mechanisms.
literature and provide a qualitative analysis of them.
IV Proposed System
S. No Type of Question Question and
and Answering Answering System Since both the Open Domain QA System and Close
System Methods Domain QA System have their own pros and cons a new
1 Multilingual Tokenization and concept of Question Answering has been coined by
Question/Answering pos tagging., Word Molla& Vice do [4] called RESTRICTED DOMAIN QA
sense SYSTEM, which is the midway of these two domains.
disambiguation,
Answer type We are convinced that research in restricted domains will
identification, drive the convergence between structured knowledge-
Keywords based and free text-based question answering.
expansion,
Semantic
Disambiguation
2 Analysis of the Query
Asks Question- Reformulation,
Answering System NGram Mining, N-
Gram Filtering, N-
Gram Tiling.
3 Multilingnality, Answering
Spatial- temporal architecture
context awareness,
Textual entailment
4 A Question Expected Answer

6
International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

• Document Processing: Once the question has been


processed we will move towards the document
processing module. In this module the documents
which are relevant to the given question are retrieved
and processed. Following steps used in document
processing.

• Get the question in hand and search


relevant documents using a reliable
search engine.

• Take top relevant documents.


fuzzy c- mean algorithm then process for question
and answer processing.
Figure 2. Proposed System
V. Result Analysis
Our proposed system performs following operations: Question and Answering System is developed in this
research with help Java (JDK1.8) and Net Beans IDE8.02
• Question Processing: In this module the given on window operating sytem7. All forms of Question
Question is processed to get some important Answering System design in Swing. Graph plotted for
information from it. Steps through which question computation time, type of question and memory
Processing Module passes and their descriptions are management using JFree Chart Library. In Result Analysis
given below. Steps in Question Processing Module: compare Proposed Question Answering system with
existing Question Answering system in term of
• computation time and memory.
Find the Type of given question using Wh word.
In Question Answering System took each type
• Find out the expected type of answer. of questions for experiment like Factoid Question,
Descriptive and Definition. Wikipedia used as dataset for
• Get the Keywords from the Question. search Question answers. Below figure 5.1 shows that
home screen of project.

• Find out the Focus of the question.

he first step in the QA System is the Question Processing


or Question Classification module. Various information,
which we will get through this module, are the Type of
Question, Expected Answer Type, Focus or Head Word of
the Question and the Question Keywords.
• xtract the content from these documents.
Question Type
• Save these contents in to file
WH Factoid Definition Descriptive
word
• Answer Type Type
processing: This Typemodule
presents algorithms for extracting the
Who answer How
Questionpotential for all What three
the
When What
categories
What of questions
Why that is Definition
Type Where
of Question, Descriptive Type of
Question
Whichand Factoid Type of Question.

• Dataset Clustering: cluster dataset using


7
International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

Table 5.2 Question Type of each Question

S.No. Question Number Question Type


1 Question Number 1 Description
2 Question Number 2 Definition
3 Question Number 3 Definition
4 Question Number 4 Definition
5 Question Number 5 Description
6 Question Number 6 Definition
7 Question Number 7 Factoid
8 Question Number 8 Description
9 Question Number 9 Factoid
10 Question Number 10 Definition

Figure 5.1 Home screen of project. Computation Time

We calculate computation time for Exiting


Evolution Parameters Question Answering system and Proposed
In Question Answering system focus on following Question Answering system. And results shown
parameters with help of graph. From experiments found that
• Question type Proposed Question Answering system less
• Computation Time computation time compare to Existing Question
• Memory Management Answering system.
Question Types
Find type of Question corresponding to Enter
Question for Answer. Using type of question
design template that helps to find more accurate
answer for given entered Question.

Figure 5.2 Question Types Figure 5.3 Computation time for Existing and Proposed
System.

Table 5.1 Number of Question in Types

S.No. Question Type No. Of Questions Table 5.3 Computation time for Existing and Proposed
System.
1 Definition Type 6 Computation Computation
Time of Time of
2 Description Type 2
Question Existing Proposed
S.No.
3 Factoid Type 2 Number Question Question
Answering Answering
System System

8
International Journal of Research in Science & Engineering
Vol X Issue IX Jan 2023
P-ISSN:2394-8280 E-ISSN:2394-8299

(MS) (MS)

Question
1 13203 7215
Number 1
Question
2 9853 4978
Number 2
Question
3 10340 9734
Number 3
Question
4 22565 5123
Number 4
Question
5 11287 7460 Figure 5.5 Initial screens for Existing and Proposed
Number 5
System.
Computation Memory

We calculate computation memory for Exiting Question


Answering system and Proposed Question Answering
system. And results shown with help of graph. From
experiments found that Proposed Question Answering
system less computation memory compare to Existing
Question Answering system.

Figure 5.6 Existing Question Answering System

Figure 5.3 Computation Memories for Existing and


Proposed System.

Output Screen

Figure 5.7 Proposed Question Answering System

9
International Journal of Research in Science & Engineering
Vol IV Issue IV Aug 2018
P-ISSN:2394-8280 E-ISSN:2394-8299

Question
4 2164 1946
Number 4
Question
5 1798 1816
Number 5

VI Conclusion
In this paper we have proposed a framework for restricted
domain question Answering System using advanced NLP
tools and software. This framework can be used to develop
a Question Answering System for extracting exact and
precise answer from restricted domain textual data set. The
Figure 5.8 Select Head Word of Question proposed framework not only provides a simple and
implementable framework for developing question
Answering System but also provides a proper flow of data
for answer extraction.
Figure 5.6 Answer of Proposed Question Ans
Since the proposed model works over keywords and
headword and is independent of the question or sentence
structure, it has reduced the overhead of question
normalization. Moreover since the framework is given for
restricted domain, it also handles the issue of word sense
disambiguation. The major problem which exists with the
proposed framework is that it's performance is dependent
on the performance of the search engine and the used NLP
tools.

References
[1]. Sreelakshmi V, Sangeetha Jamal, Survey Paper :
Question Answering Systems, in National Conference on
Computing and Communication - (NCCC ), March 2014,
GEC Idukki.
Table 5.4 Computation memory for Existing and Proposed [2]. M Ramprasath, S Hariharan Improved Question
System Answering System by semantic reformulation, IEEE-
Computation Computation Fourth International Conference on Advanced Computing,
Memory of Memory of 2012.
Existing Proposed [3]. Ali Mohamed Nabil Allam, and Mohamed Hassan
Question Haggag, The Question Answering Systems: A Survey,
S.No. Question Question
Number International Journal of Research and Reviews in
Answering Answering
System System Information Sciences (IJRRIS), September 2012 Science
(MB) (MB) Academy Publisher, United Kingdom
Question [4] Molla D., and Vicedo J., "Question answering in
1 2183 1940 restricted domains: An overview", Computer Linguist,
Number 1
Question ppAI-6 1, 2007
2 2434 2015 [5] Moreda P., Llorens H., Saquete E., & Palomar M.,
Number 2
"Combining semantic information in question answering
Question
3 2507 2473 systems", Information Processing & Management, pp.870-
Number 3
885, 20 1 1.

10
International Journal of Research in Science & Engineering
Vol IV Issue IV Aug 2018
P-ISSN:2394-8280 E-ISSN:2394-8299

[6] Svetlana Stoyanchev, and Young Chol Song, and [16] Ko, Jeongwoo., Si, Luo., and Nyberg Eric.
William Lahti, "Exact Phrases in Information Retrieval for “Combining evidence with a probabilistic framework for
Question Answering", Coling 2008: Proceedings of the answer ranking and answer merging in question
2nd workshop on Information Retrieval for Question answering” Journal: Information Processing and
Answering (IR4QA), pp. 9- 16 Manchester,UK. August Management 46,
2008". 2010 541-554. DOI: 10.1016/j.ipm.2009.11.004. Elsevier.
[7] Woods W.A, Kaplan R.A, Nash-Webber.B, "The lunar [17] Pachpind Priyanka P, BornareHarshita N,
sciences naturallanguage information system" , Final KshirsagarRutumbhara B, Malve Ashish D BE Comp
report: BBN Report #2378. Technical report, Bolt Beranek S.N.D COE & RC, YEOLA,” An Automatic Answering
and Newman Inc.,Cambridge, MA., June 1972. System Using Template Matching For Natural Language
[8] Green RF, Wolf A.K., Chomsky, K. Laughery, Questions”.
"BASEBALL: An automatic question answerer", in:
Proceedings of Western Computing Conference, vol.19,
pp. 2 19-224, 196 1.
[9] Ittycheriah A, Franz M, Zhu WJ, Ratnaparkhi A and
Mammone RJ. IBM‟s statistical question answering
system. In Proceedings of the Text Retrieval Conference
TREC-9, 2000.
[10] Athira P. M., Sreeja M. and P. C. Reghuraj
Department of Computer Science and Engineering,
Government Engineering College, Sreekrishnapuram,
Palakkad, Kerala, India, 678633. Architecture of an
Ontology-Based Domain-Specific Natural Language
Question Answering System.

[11] Pragisha K. “design and implementation of a QA


system in Malayalam”.

[12] Sanjay K Dwivedi, Vaishali Singh. Research and


reviews in question answering system Department of
Computer Science, B. B. A. University (A Central
University) Luck now, Uttar Pradesh, 226025, India.

[13] Poonam Gupta, Vishal Gupta


Assistant Professor, Computer Science & Engineering
Department University Institute of Engineering &
Technology Panjab University, Chandigarh.

[14] Kolomiyets, Oleksander. And Moens, Marie-


Francine. “A survey on question answering technology
from an information retrieval perspective”. Journal of
Information Sciences 181, 2011.5412-5434. DOI:
10.1016/j.ins.2011.07.047. Elsevier.

[15] Moreda, Paloma., Llorens Hector., Saquete, Estela.


And Palomar, Manuel.“Combining semantic information
in question answering systems” Journal of Information
Processing and Management 47, 2011. 870- 885. DOI:
10.1016/j.ipm.2010.03.008. Elsevier.

11
International Journal for Rapid Research in Engineering Technology & Applied Science
Vol X Issue X Feb 2023
ISSN:2455-4723

Implementation of Question and Answering Retrieval System in


Natural Language Processing
Nilam Chourasiya
Department of Computer Science
Sanghvi Institute of Management & Science Indore
Abstract: Question and Answering System is one of the module and Answer Processing module. Each Processing
major researches are in Natural Language. Main and Information Retrieval module contains several sub
challenges of Question and Answer system gives exact modules and these modules use several Natural Language
answer of question which give by user. Question and Processing Techniques in order to extract the proper
Answering system can be classified into three category answer. The usual Question Answering system is designed
are open domain, closed domain and restricted to answer simple wh-questions like “who”, “what”,
domain. Using advanced Natural Language Processing “when”, “where”, etc. But the recent QA research focuses
tool we will be developed a framework for question on extending the system to answer complex questions,
answering system. In this paper we work on restricted summary questions, opinion questions etc. The paper
domain question answering system. Proposed system proposes a Question Answering system that answers
work on keyword and question matching and return simple factoid, wh-questions by using a technique called
precise answer of question. Semantic Role Labeling.

Keywords: Natural Language processing, information


retrieval, semantic similarity, restricted domain,
answer extraction, answer ranking

I Introduction
Although the set of documents which are retrieved by the
search engine contain a lot of information about the search Figure 1. Block Diagram Question Answering System
topic but it may or may not contain exactly that
information which the user is looking for [1].The basic The rest of the paper is organized as follows. The next
idea behind the question answering system is that the users section describes the general architecture of a Question
just have to enter the question and the system will retrieve Answering System. Section 3 discusses some of the
the most appropriate and precise answer for that question related works in this area. The proposed system
and return it to the user. Hence in those cases where the architecture is described in section 4. The paper concludes
user is looking for a short and precise answer, question with the experimental setup and results.
answering System plays a great role rather than Search
Engines, which usually provide a large set of links of those
web pages which might contain the answer of that II Architecture of a Question Answering
question. A typical Question Answering system can be
divided into 3 modules namely: Question Processing In this section we describe the architecture of our system.
module, Document Processing or Information Retrieval The overall architecture of the system can be subdivided
into three main modules: (1) pre-processing, (2) question
template matching, and (3) answering. Each module is
described in detail in the following subsections.

Paper ID: 2018/IJRRETAS/8/2018/37664 1


International Journal for Rapid Research in Engineering Technology & Applied Science
Vol X Issue X Feb 2023
ISSN:2455-4723

Question Answering Systems can be classified on the basis Another system developed by Pragisha K. Et.al [11],
of the domains over which it has been constructed. described about the. It receives Malayalam natural
language questions from the user and extracts most
• Open Domain Question Answering appropriate response by analyzing a collection of
Malayalam documents. The system handles four each
• Close Domain Question Answering question.

• Restricted Domain Question Answering Research and reviews in question answering system
developed by Sanjay K Dwivedi Et.al[12] propose
Open domain question answering systems are domain taxonomy for characterizing Question Answer (QA)
independent. It relies on general ontology and world systems, survey of major QA systems described in
knowledge. Usually these systems have a large collection literature and provide a qualitative analysis of them.
of data from where the required answer is to be found out.
Since in case of Open Domain question answering S. No Type of Question Question and
information content is not of particular domain it can and Answering Answering System
answer questions of various fields however here deep System Methods
reasoning is not possible [3]. 1 Multilingual Tokenization and
Question/Answering pos tagging., Word
Close domain question answering systems deal with sense
questions in a specific domain [3]. LUNAR and disambiguation,
BASEBALL are the example of close domain QA systems Answer type
.In this case the data set contains a very limited amount of identification,
focused and structured information . hence in case of close Keywords
domain question answering systems deep reasoning is expansion,
possible but the problem with these systems was that due Semantic
to the very small size of data set they are not more than a Disambiguation
'Toy Systems"[4]. 2 Analysis of the Query
Asks Question- Reformulation,
Research in restricted-domain question answering Answering System NGram Mining, N-
(RDQA) addresses problems related to the incorporation Gram Filtering, N-
of domain• specific information into current state-of-the- Gram Tiling.
art QA technology with the hope of achieving deep 3 Multilingnality, Answering
reasoning capabilities and reliable accuracy performance Spatial- temporal architecture
in real world applications. In fact, as a not too-long-term context awareness,
vision. Textual entailment
4 A Question Expected Answer
III Literature Survey Answering System Type, Named
based on Entities Presence,
In most of the research papers [4, 5, 6] LUNAR [7] and Information
BASEBALL [8] have been discussed as the earlier Retrieval and
developed question answering systems. However there are Validation
various question answering systems which have been 5 A Hybrid Question Module,
developed with different concepts since the idea of QA Answering System Hypothesis
System has been coined based on Generation
Information Module, Document
In a system developed Athira P. M, Et.al [10], presented Retrieval and Processing and
an architecture of ontology-based domain-specific natural Answer Validation Indexing
language question answering that applies semantics and 6 A specifiable Answering
domain knowledge to improve both query construction and domain multilingual architecture
answer extraction. Question

Paper ID: 2018/IJRRETAS/8/2018/37664 2


International Journal for Rapid Research in Engineering Technology & Applied Science
Vol X Issue X Feb 2023
ISSN:2455-4723

In a System developed by Poonam Gupta Et.al [13] A •


Survey of Text Question Answering Techniques. Question Find the Type of given question using Wh word.
answering is a difficult form of information retrieval
characterized by information needs that are at least • Find out the expected type of answer.
somewhat expressed as natural language Template
Matching Automatic Answering System For natural • Get the Keywords from the Question.
languages questions proposed by Pachpind Priyanka Et.al
[17], Frequently Asked QA System that replies with pre-
stored answers to user questions asked in regular English, • Find out the Focus of the question.
rather than keyword or sentence structure based retrieval
mechanisms.
he first step in the QA System is the Question Processing
IV Proposed System or Question Classification module. Various information,
which we will get through this module, are the Type of
Since both the Open Domain QA System and Close Question, Expected Answer Type, Focus or Head Word of
Domain QA System have their own pros and cons a new the Question and the Question Keywords.
concept of Question Answering has been coined by
Molla& Vice do [4] called RESTRICTED DOMAIN QA Question Type
SYSTEM, which is the midway of these two domains.
WH Factoid Definition Descriptive
We are convinced that research in restricted domains will word Type Type Type
drive the convergence between structured knowledge-
based and free text-based question answering. Question Who How What
When What
What Why
Where
Which
• Document Processing: Once the question has been
processed we will move towards the document
processing module. In this module the documents
which are relevant to the given question are retrieved
and processed. Following steps used in document
processing.

• Get the question in hand and search


relevant documents using a reliable
search engine.

• Take top relevant documents.

Figure 2. Proposed System • Extract the content from these


documents.
Our proposed system performs following operations:
• Save these contents in to file
• Question Processing: In this module the given
Question is processed to get some important • Answer processing: This module presents
information from it. Steps through which question algorithms for extracting the potential answer for
Processing Module passes and their descriptions are all the three categories of questions that is
given below. Steps in Question Processing Module: Definition Type of Question, Descriptive Type of
Question and Factoid Type of Question.

Paper ID: 2018/IJRRETAS/8/2018/37664 3


International Journal for Rapid Research in Engineering Technology & Applied Science
Vol X Issue X Feb 2023
ISSN:2455-4723

• Dataset Clustering: cluster dataset using fuzzy c-


mean algorithm then process for question and
answer processing.

V. Result Analysis
Question and Answering System is developed in this
research with help Java (JDK1.8) and Net Beans IDE8.02
on window operating sytem7. All forms of Question Figure 5.2 Question Types
Answering System design in Swing. Graph plotted for
computation time, type of question and memory
management using JFree Chart Library. In Result Analysis
compare Proposed Question Answering system with Table 5.1 Number of Question in Types
existing Question Answering system in term of
computation time and memory. S.No. Question Type No. Of Questions

In Question Answering System took each type of 1 Definition Type 6


questions for experiment like Factoid Question,
Descriptive and Definition. Wikipedia used as dataset for 2 Description Type 2
search Question answers. Below figure 5.1 shows that 3 Factoid Type 2
home screen of project.

Table 5.2 Question Type of each Question

S.No. Question Number Question Type


1 Question Number 1 Description
2 Question Number 2 Definition
3 Question Number 3 Definition
4 Question Number 4 Definition
5 Question Number 5 Description
6 Question Number 6 Definition
7 Question Number 7 Factoid
8 Question Number 8 Description
9 Question Number 9 Factoid
10 Question Number 10 Definition
Figure 5.1 Home screen of project.
Computation Time

Evolution Parameters We calculate computation time for Exiting


In Question Answering system focus on following Question Answering system and Proposed
parameters Question Answering system. And results shown
• Question type with help of graph. From experiments found that
• Computation Time Proposed Question Answering system less
• Memory Management computation time compare to Existing Question
Question Types Answering system.
Find type of Question corresponding to Enter
Question for Answer. Using type of question
design template that helps to find more accurate
answer for given entered Question.

Paper ID: 2018/IJRRETAS/8/2018/37664 4


International Journal for Rapid Research in Engineering Technology & Applied Science
Vol X Issue X Feb 2023
ISSN:2455-4723

Figure 5.3 Computation time for Existing and Proposed Figure 5.3 Computation Memories for Existing and
System. Proposed System.

Output Screen
Table 5.3 Computation time for Existing and Proposed
System.
Computation Computation
Time of Time of
Existing Proposed
Question
S.No. Question Question
Number
Answering Answering
System System
(MS) (MS)
Question
1 13203 7215
Number 1
Question
2 9853 4978
Number 2
Question
3 10340 9734 Figure 5.5 Initial screens for Existing and Proposed
Number 3
System.
Question
4 22565 5123
Number 4
Question
5 11287 7460
Number 5

Computation Memory

We calculate computation memory for Exiting Question


Answering system and Proposed Question Answering
system. And results shown with help of graph. From
experiments found that Proposed Question Answering
system less computation memory compare to Existing
Question Answering system.

Paper ID: 2018/IJRRETAS/8/2018/37664 5


International Journal for Rapid Research in Engineering Technology & Applied Science
Vol X Issue X Feb 2023
ISSN:2455-4723

Figure 5.6 Existing Question Answering System

Table 5.4 Computation memory for Existing and Proposed


System
Computation Computation
Memory of Memory of
Existing Proposed
Question
S.No. Question Question
Number
Answering Answering
System System
(MB) (MB)
Question
1 2183 1940
Number 1
Question
2 2434 2015
Number 2
Question
3 2507 2473
Figure 5.7 Proposed Question Answering System Number 3
Question
4 2164 1946
Number 4
Question
5 1798 1816
Number 5

VI Conclusion
In this paper we have proposed a framework for restricted
domain question Answering System using advanced NLP
tools and software. This framework can be used to develop
a Question Answering System for extracting exact and
precise answer from restricted domain textual data set. The
proposed framework not only provides a simple and
Figure 5.8 Select Head Word of Question implementable framework for developing question
Answering System but also provides a proper flow of data
for answer extraction.

Figure 5.6 Answer of Proposed Question Ans Since the proposed model works over keywords and
headword and is independent of the question or sentence
structure, it has reduced the overhead of question
normalization. Moreover since the framework is given for
restricted domain, it also handles the issue of word sense
disambiguation. The major problem which exists with the
proposed framework is that it's performance is dependent
on the performance of the search engine and the used NLP
tools.

References
[1]. Sreelakshmi V, Sangeetha Jamal, Survey Paper :
Question Answering Systems, in National Conference on
Computing and Communication - (NCCC ), March 2014,
GEC Idukki.

Paper ID: 2018/IJRRETAS/8/2018/37664 6


International Journal for Rapid Research in Engineering Technology & Applied Science
Vol X Issue X Feb 2023
ISSN:2455-4723

[2]. M Ramprasath, S Hariharan Improved Question Department University Institute of Engineering &
Answering System by semantic reformulation, IEEE- Technology Panjab University, Chandigarh.
Fourth International Conference on Advanced Computing,
2012. [14] Kolomiyets, Oleksander. And Moens, Marie-
[3]. Ali Mohamed Nabil Allam, and Mohamed Hassan Francine. “A survey on question answering technology
Haggag, The Question Answering Systems: A Survey, from an information retrieval perspective”. Journal of
International Journal of Research and Reviews in Information Sciences 181, 2011.5412-5434. DOI:
Information Sciences (IJRRIS), September 2012 Science 10.1016/j.ins.2011.07.047. Elsevier.
Academy Publisher, United Kingdom
[4] Molla D., and Vicedo J., "Question answering in [15] Moreda, Paloma., Llorens Hector., Saquete, Estela.
restricted domains: An overview", Computer Linguist, And Palomar, Manuel.“Combining semantic information
ppAI-6 1, 2007 in question answering systems” Journal of Information
[5] Moreda P., Llorens H., Saquete E., & Palomar M., Processing and Management 47, 2011. 870- 885. DOI:
"Combining semantic information in question answering 10.1016/j.ipm.2010.03.008. Elsevier.
systems", Information Processing & Management, pp.870-
885, 20 1 1. [16] Ko, Jeongwoo., Si, Luo., and Nyberg Eric.
[6] Svetlana Stoyanchev, and Young Chol Song, and “Combining evidence with a probabilistic framework for
William Lahti, "Exact Phrases in Information Retrieval for answer ranking and answer merging in question
Question Answering", Coling 2008: Proceedings of the answering” Journal: Information Processing and
2nd workshop on Information Retrieval for Question Management 46,
Answering (IR4QA), pp. 9- 16 Manchester,UK. August 2010 541-554. DOI: 10.1016/j.ipm.2009.11.004. Elsevier.
2008". [17] Pachpind Priyanka P, BornareHarshita N,
[7] Woods W.A, Kaplan R.A, Nash-Webber.B, "The lunar KshirsagarRutumbhara B, Malve Ashish D BE Comp
sciences naturallanguage information system" , Final S.N.D COE & RC, YEOLA,” An Automatic Answering
report: BBN Report #2378. Technical report, Bolt Beranek System Using Template Matching For Natural Language
and Newman Inc.,Cambridge, MA., June 1972. Questions”.
[8] Green RF, Wolf A.K., Chomsky, K. Laughery,
"BASEBALL: An automatic question answerer", in:
Proceedings of Western Computing Conference, vol.19,
pp. 2 19-224, 196 1.
[9] Ittycheriah A, Franz M, Zhu WJ, Ratnaparkhi A and
Mammone RJ. IBM‟s statistical question answering
system. In Proceedings of the Text Retrieval Conference
TREC-9, 2000.
[10] Athira P. M., Sreeja M. and P. C. Reghuraj
Department of Computer Science and Engineering,
Government Engineering College, Sreekrishnapuram,
Palakkad, Kerala, India, 678633. Architecture of an
Ontology-Based Domain-Specific Natural Language
Question Answering System.

[11] Pragisha K. “design and implementation of a QA


system in Malayalam”.

[12] Sanjay K Dwivedi, Vaishali Singh. Research and


reviews in question answering system Department of
Computer Science, B. B. A. University (A Central
University) Luck now, Uttar Pradesh, 226025, India.

[13] Poonam Gupta, Vishal Gupta


Assistant Professor, Computer Science & Engineering

Paper ID: 2018/IJRRETAS/8/2018/37664 7

You might also like