Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Text Summarization

For Review And Feedback


BY :Aman Sadhwani

Monday, May 18,


2015 1
What is Text Summarization?
And why we need it?

• We can define summary as a text which reflects the main and important sentences from
the original text. In Text summarization, Summary is generated by Computer.
• In Recent Years we are witnessing the amount of textual information is increasing day by
day .The Textual Information grows rapidly. It becomes more difficult for the user to read
the textual information and also it leads to loss of interest. That is the reason why Text
Summarization came into picture which will solve this problem.

Monday, May 18,


2015 2
Types of Text Summarization

 1) Extraction: - In Extractive text summarization , summary is generated by selecting a set


of words, phrases, paragraph or sentences from the original document.

 2) Abstraction: - Abstractive methods are based on semantic representation and then use
natural language processing techniques to generate a summary that is nearer to
summary generated manually. This kind of summary may contain words that are not found
in the original document. Currently research is going on this method and demand for this
method is more.

Monday, May 18,


2015 3
Proposed System

 We have developed and compared two text summarization techniques


1) Reduction based
2) Inter section based

Monday, May 18,


2015 4
How Reduction Algorithm Works

 Step 1 - It takes a text as input.


 Step 2 - Splits it into one or more paragraph(s).
 Step 3 - Splits each paragraph into one or more sentence(s).
 Step 4 - Splits each sentence into one or more words.
 Step 5 - Gives each sentence weight-age (a floating point value) by comparing Its words
to a pre-defined dictionary called "stopWords.txt“
 If some word of a sentence matches to any word with the pre-defined Dictionary, then
the word is considered as Low weighted.

Monday, May 18,


2015 5
Cont..

 Step 6 - An ordered list of weighted sentences is then prepared (Relatively High weighted
sentences comes first and low weighted sentences comes At last position).
 Step 7 - Now, we have the ordered list of weighted sentences, it continues to Store each
sentence (from ordered weighted sentences) in the output Variable (i.e. a list) until it
reaches the reduction ratio (It uses A formula to determine max number of sentences to
put in the output List)

 Step 8 - The output list is then returned.

Monday, May 18,


2015 6
How InterSection Algorithm Works?

1. Split input text into Paragraph.


2. Split paragraph into sentences.
3. Split sentences into words.
4. Calculate the intersection between 2 sentences.
5. Remove non-alphabetic characters from sentence.
6. Convert content into dictionary.
7. Build the sentence dictionary.
8. Return best sentences in a paragraph.
9. Get the best sentences according to dictionary.

Monday, May 18,


2015 7
Flow Chart

Monday, May 18,


2015 8
Screen shots

Monday, May 18,


2015 9
Monday, May 18,
2015 10
Monday, May 18,
2015 11
Monday, May 18,
2015 12
Monday, May 18,
2015 13
Monday, May 18,
2015 14
Conclusion

Monday, May 18,


2015 15
Cont…

 By looking at last table we can say that intersection is faster than reduction
 But reduction creates better summary than intersection.
 Intersection works fine on some documents but generates only 1 or 2 line of summary on
some documents.
 This is because intersection is the most basic algorithm for text summarization. It doesn’t
use any NLP libraries like reduction.

Monday, May 18,


2015 16
Hardware & Software requirement

 Minimum Hardware Requirements


 Processor : Intel Pentium II or Higher
 RAM : 128 Mb or Higher
 Monitor ,Keyboard, Mouse
 Printer (Optional)
 Hard disk : 20 GB Or Higher

 Software Requirements
 OS: Windows xp or higher
 Java Installed On Machine
 Python 2.7 installed on machine.

Monday, May 18,


2015 17
Tools used

 NetBeans
 Python 2.7 IDLE

Monday, May 18,


2015 18
References

 http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html

 http://www.iajet.org/iajet_files/vol.1/no.4/Text%20Summarization%20Extraction%20System%
20TSES%20Using%20Extracted%20Keywords_doc.pdf

 http://en.wikipedia.org/wiki/Sentiment_analysis

Monday, May 18,


2015 19
Future enhancement

 Will support summarization for multiple file types.


 User wise Document management.
 Multi document summarization.
 Improved summarization algorithms.

Monday, May 18,


2015 20
THANK YOU

Monday, May 18,


2015 21

You might also like