Professional Documents
Culture Documents
Final Updated PPTX 22jan PDF
Final Updated PPTX 22jan PDF
Final Updated PPTX 22jan PDF
SUPERVISOR
PRESENTED BY
Main document
Summary
HYBRID BANGLA TEXT SUMMARIZATION 4
▪ Saving reading time for big documents by ignoring useless and redundant data
Generic Summarization
Generic summary gives users the overall
sense of document. It typically contains core
AUTOMATIC information of the document.
TEXT
SUMMARIZATION
Topic Centric Summarization
The topic centric summaries are generally
restricted to one topic. Topic is typically
based on the key words supplied by human.
LITERATURE REVIEW 7
Significance on:
▪ Stop words removal, stemming and tokenization have been done as preprocessing.
Sentence ranking has been done with thematic terms and sentence position.
Unigram based
Precision:53.80%
Document “Ananda Bazar Recall score 0.4122
Resource Evaluation Recall: 55.60%
Patrika” The score for
F-Score: 54.60%
LEAD baseline 0.3991
LITERATURE REVIEW 8
Significance on:
▪ Sentence ranking
▪ Clustering
Start
Input Document
Pre-Processing
End
PROPOSED MODEL 14
Start
Input Document
INPUT DOCUMENT
15
** The summaries we are considering as the gold summaries are created by random people.
INPUT DOCUMENT
16
Start
Input Document
Pre-Processing
PREPROCESSING
18
We have removed the stop words from the documents from a list of stop words
Start
Input Document
Pre-Processing
Start
Input Document
Pre-Processing
Here we are calculating the total value of the sentence based on the Keyword
ACTUAL NEWS
Here we are calculating the total value of the sentence based on the Sentiment
ACTUAL NEWS
SSCORE= 0.0001+ 0.0001+ 0.0001+ 0.0001+ 0.0001+ 0.0001+ 0.0001+ 0.0001+ 0.0001= 0.0009
▪ This method includes tokenization of each sentence from training data sets and converting them into
sentence vectors. In order to do so we used “word2vec” imported from “genism.models”.
▪ In order to score sentences using text rank method, we need to find similarities between sentences.
First the vector model was loaded and each sentence was converted to vector using “sentence2vec”
PROPOSED MODEL 31
Start
Input Document
Pre-Processing
We added weight to each of them and the final ranking will be based on the
combination of the three.
KR = Keyword Ranking
SS = Sentiment scoring
TR = Text Ranking
W1 = A percentage of the total Keyword score
W2 = A percentage of the total Sentiment score
W3 = A percentage of the total Text Ranked score
HYBRID SCORING
33
Start
Input Document
Pre-Processing
End
SAMPLE GENERATED OUTPUT 35
Document Number of Doc number Doc no. for Doc number Doc no. for
Category Document for testing training for testing training
(1st setup) (1st setup) (2nd setup) (2nd setup)
Politics 100 50 50 70 30
Economics 100 50 50 70 30
Entertainment 100 50 50 70 30
Accidents 100 50 50 70 30
Dataset-1 100 29 35 12 24
Dataset-2 100 18 19 6 57
CLASS SPECIFIC RESULT 40
Average ROUGE 1 Score of the system for Category Accidents (1st Setup)
0.6456 0.6606
Average ROUGE 2 Score of the system for Category Accidents (1st Setup)
0.5762 0.5902
CLASS SPECIFIC RESULT 41
Average ROUGE 1 Score of the system for Category Economics (1st Setup)
0.5619 0.5656
Average ROUGE 2 Score of the system for Category Economics (1st Setup)
0.4705 0.4756
CLASS SPECIFIC RESULT 42
Average ROUGE 1 Score of the system for Category Entertainment (1st Setup)
0.4893 0.4841
Average ROUGE 2 Score of the system for Category Entertainment (1st Setup)
0.3981 0.3839
CLASS SPECIFIC RESULT 43
Average ROUGE 1 Score of the system for Category Politics (1st Setup)
0.6708 0.6667
Average ROUGE 2 Score of the system for Category Politics (1st Setup)
0.5645 0.5585
CLASS SPECIFIC RESULT 44
Average ROUGE 1 Score of the system for Category Accidents (2nd Setup)
0.6280 0.6132
Average ROUGE 2 Score of the system for Category Accidents (2nd Setup)
0.5470 0.5280
CLASS SPECIFIC RESULT 45
Average ROUGE 1 Score of the system for Category Economics (2nd Setup)
0.5853 0.5854
Average ROUGE 2 Score of the system for Category Economics (2nd Setup)
0.4921 0.4934
CLASS SPECIFIC RESULT 46
Average ROUGE 1 Score of the system for Category Entertainment (2nd Setup)
0.4702 0.4509
Average ROUGE 2 Score of the system for Category Entertainment (2nd Setup)
0.3695 0.3538
CLASS SPECIFIC RESULT 47
Average ROUGE 1 Score of the system for Category Politics (2nd Setup)
0.6601 0.6519
Average ROUGE 2 Score of the system for Category Politics (2nd Setup)
0.5525 0.5422
EXPERIMENTAL RESULT 48
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
KeyWord Sentiment Text Rank New Hybrid
Precesion 0.542631685 0.508663132 0.508635158 0.547183881
Recall 0.606858059 0.693553635 0.692635234 0.656746838
F-measure 0.55973558 0.57365212 0.57410977 0.584185785
Average ROUGE 1 Scores of the system for different methods (1st Setup)
EXPERIMENTAL RESULT 49
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
KeyWord Text Rank New Hybrid
Precesion 0.536450081 0.500673054 0.537122573
Recall 0.621884818 0.691011787 0.664016871
F-measure 0.562766 0.568768895 0.593866852
Average ROUGE 1 Scores of the system for different methods (2nd Setup)
EXPERIMENTAL RESULT 50
0.6
0.5
0.4
0.3
0.2
0.1
0
KeyWord Sentiment Text Rank New Hybrid
Precesion 0.447528188 0.421271345 0.410866401 0.459600427
Recall 0.505991455 0.581238746 0.590964907 0.557536232
F-measure 0.463635094 0.476486973 0.473836208 0.492378553
Average ROUGE 2 Scores of the system for different methods (1st Setup)
EXPERIMENTAL RESULT 51
0.6
0.5
0.4
0.3
0.2
0.1
0
KeyWord Text Rank New Hybrid
Precesion 0.439922462 0.400916393 0.447010528
Recall 0.519289684 0.587950582 0.562046722
F-measure 0.464852314 0.466303582 0.486369813
Average ROUGE 1 Scores of the system for different methods (2nd Setup)
EXPERIMENTAL RESULT 52
0.7071 0.6348
Classification and ROUGE Scores of summaries for different categories (for BNLPC data set 2)
0.6517 0.5842
EXPERIMENTAL RESULT 53
0.6543 0.5828
Classification and ROUGE Scores of summaries for different categories (for BNLPC data set 2)
0.6526 0.5836
EXPERIMENTAL RESULT 55
0.6712 0.6016
Classification and ROUGE Scores of summaries for different categories (for BNLPC data set 2)
0.6737 0.6048
EXPERIMENTAL RESULT 57
Average ROUGE Scores of the proposed system summaries for BNLPC data sets
COMPARISON WITH EXISTING MODELS 59
Average ROUGE Scores of our proposed system and other existing systems for BNLPC data sets
Average ROUGE Scores of summaries our proposed system and other existing web based system for our data set
❑ 8 Gb ram
❑ 256 GB SSD
TIME ANALYSIS 61
ANY QUESTION?