Professional Documents
Culture Documents
2ete Research
2ete Research
USING NLP
Vedant Sharma, Vishal kumar Bachelor of Technology
in Computer Science and Engineering,
Galgotias University,Greater Noida
(vedant.20scse1010881@galgotiasuniversity.edu.in,
vishal.20scse1010856@galgotiasuniversity.edu.in)
ABSTRACT
There is a vast amount of textual content the original document Summaries cut down
available, and it continues to develop every on reading time and make the selection
day. Consider the internet, which is made up process easier when investigating
of web pages, news stories, status updates, documents. The efficacy of indexing is
blogs, and many other things. Because the improved by automatic summarization.
data is unstructured, the best we can do is Human summarizers are more prejudiced
perform search and glance through the than automatic summarising techniques.
results. Much of this text material needs to Flask, Python 3, web development (HTML,
be reduced to shorter, focused summaries CSS, BOOTSTRAP), and the Text
that capture the important elements, both so Summarizing API are the tools and
we can explore it more easily and to ensure technologies utilised in text summarization
that the larger papers include the (which uses NLP). One of the next goals
information we need. The proposed solution could be to apply the topic-focused
for this problem is a web application that summarization framework to news articles
uses abstractive text summarization to or blogs, as well as to expand the machine
provide a paraphrase of the essential points. learning research.
text, using a vocabulary set not the same as
1.1 INTRODUCTION
The Internet is a data warehouse. The characters, text data is more difficult to
internet provides access to news, movies, interpret. Because of the massive amount of
education, medicine, health, nations, data available, a system must be in place to
weather, geology, and other topics. This ensure that only the most important aspects
could be data in the form of statistics, of the data are accessed. This can be
numerical, mathematical, or text[1]. accomplished by text summarization. Text
Because of the increased number of summary has been the subject of decades of
inquiry and study. To generate brief summarise material from many sources.
summaries, various methods have been
developed and tested on various datasets. EXTRACTIVE SUMMARIZATION
Different comparison scores are used to To generate a summary, extractive text
compare them. Extractive or astractive text summarization selects a subset of existing
summarising, single or multi-document words, phrases, or sentences from the
summary are all options. Extractive text original text. Extractive summarization
summarising is a technique for creating selects essential phrases or keywords from a
summaries from documents that use the document using a statistical approach.
same language as the original. ABS is more Extractive summarization selects essential
generic and concentrates on the document's phrases or keywords from a document using
main ideas. Single document summarising a statistical approach.
approaches produce summaries of a single
document's text, while multi document ABSTRACTIVE SUMMARIZATION
interaction. The process of designing a this could include words that aren't in the
system that can process and produce original. It entails comprehending the
language as well as a human can is known original content and retelling it in a more
According to this strategy [3] [4], sentences in words. Lexical chains are created by grouping
the title are deemed more important and are (chaining) semantically similar groupings of
more likely to be included in the summary. words in a source document. The relationships
The number of words usually used between a between words that may cause them to be
sentence and the title determines the phrase's placed into the same lexical chain include
where it comes in a paragraph: first, middle, are collections of synonyms. Word Net
or last, or in a prominent area of the work like additionally gives a short definition for each
the conclusion or opening. The first few sys-net as well as a semantic relationship
sentences of a document, as well as the last between them. Word-net also functions as a
few phrases or the conclusion, are deemed thesaurus and an online dictionary, and many
more important and are included in the systems use it to determine word
summary. relationships.
The most common words, according to The use of graph theory [16] is another way
Luhn[10] convey the text's most essential for summarization. To generate a semantic
topic. His plan was to assign a score to each network of the original content, the author
sentence depending on the number of times presented an approach based on subject-
the terms appeared, and then chose the one object-predicate (SOP) triples from individual
with the greatest score. phrases. The relevant concepts, which contain
the meaning, are dispersed among the
sentences. Identifying and exploiting links summarised by using Word-net to generate a
among them, according to the author [16], sub-graph. The Word Net is used to apply
could be useful for extracting pertinent text. weights to nodes in the sub-graph in relation
to the synsnet. The most widely used text
One of the researchers, IIT Bombay's Pushpak summarising approaches employ either a
Bhattacharyya [17], proposed a Word Net- statistical or linguistic approach, or a
based summarising method. The document is combination of both.
Easy to
It consists of Suffers from
Extractive compute [19]
selecting inconsistency
important
sentences
3) No. of input Single- Can accept Less overhead Cannot summarize [19]
document document only one input multiple related
document documents
[11] Goldstein, J., Kantrowitz, M., Mittal, V., [20] Martin Hassel, Nima Mazdak, “A Persian
Carbonell, J., 1999.Summarizing text text summarizer”, International Conference on
documents: Sentence selection and evaluation Computational Linguistics, 2004.
metrics. In: Proc. ACM-SIGIR’99, pp. 121–
128.