Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

Papers summary and approach

1. Learning based Word Search and Visualisation for Historical Manuscript Images
This is PhD thesis in which it shows comparison between Segmentation based and Segmentation free
method to achieve word spotting. There is comparison between Query by Example and Query by String
method to find the similarity. They have tried to implement their own new model Ctrl-F model.

2. BAG of Visual Words for Word Spotting in Handwritten Documents Based on Curvature Features
In this paper they have tried to achieve word spotting using Bag of Visual Words by extracting SIFT and
SURF features at each key points. Here implementation is done by Query by Example and local patches are
expressed by bag of visual words model and feature extraction by Scale Invariant Feature Transform (SIFT).

3. Semantic and Verbatim Word Spotting using Deep Neural Networks


In this approach, the model used is triplet CNN and a word embedding to perform both Query by String and
Query by Example word spotting. For word embeddings proposed method is PHOC (Pyramidal Histogram
of Characters), which represents a word as high dimensional binary histogram of occurrence of characters.
Then cosine embedding loss is implemented to train the network and the same is used to retrieve the results.

4. Learning Deep Representations for Word Spotting Under Weak Supervision


In this paper, they have proposed a way to avoid the requirement of large annotated dataset. The
implemented model here is PHOCNet and has created synthetic dataset for training purpose. There is a
comparison of various training dataset in the model

5. PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents
This paper is an explanation of architecture of PHOCNet and its working. PHOCNet is a deep CNN model
which achieves word spotting implementing Query by Example and Query by String method.

6. A Novel Word Spotting Method Based on Recurrent Neural Networks


This paper proposes method for word spotting using LSTM model in combination with connectionist token
passing algorithm. In order to do similarity matching for query word and candidate words models used here
as Hidden Markov Model (HMM), Dynamic Time Warping (DTW).

7. Keyword Spotting in Document Images through Word Shape Coding


In this paper , the method adopted for similarity between user input word and word images in the document
by Word Shape Coding. It uses a page segmentation approach to segment words and expects a word
bounding box before feature extraction. For feature extraction methods like character ascenders, descenders,
deep eastward and westward concavity, holes, i-dot connectors and horizontal-line intersection are used and
compared.

8. Query by String word spotting based on character bi-gram indexing


In this paper, the implemented method is segmentaion free Query-by-String appproach for word spotting
method.Here for word representation implemented method is based on PHOC and the attributes are learnt by
using PHOC labels by a SVM classifier obtained from text transcription of all training words. At query time
tqueried text is also encoded using PHOC representation and then finding of word images closest to query
image whose attribute representaton is closer to queried image.
9. R-PHOC: Segmentation-Free Word Spotting using CNN
This paper proposes a network which takes word input (QbE or QBS) and a set of candidate words and
embedding using bounding box embedding all the bounding boxes in an embeddding space. Then
implementing PHOC representaion through R-PHOC network. A CNN is used to get all the connected
components (words) in the document then implementing PHOC attribute representations to perform spotting
based on closest match.

10. Word Spotting and Recognition using Deep Embedding


This paper proposes a method to come up with an end2end embedding framework to represent both image
and labeks in same embedding space and developed an image descriptor using PHOC embedding and
HWNet to extract feartures and perform word spotting and recognition.

11. Cloud Document Understanding AI


It performs by the steps converting images to text (OCR), classifying documents, analyze and extract entities
, parse tables and extract key-value pairs from images then creating a knowledge base. Google uses their own
inbuilt soultions for all the steps.

11. OCR Implementation – Using Tesseract to parse the text and extract the value.

12. Deep Learning Method – Using ibject detection method to detect specific area in the image then extract
the values.

13.
Finalised Models:

1. PHOCNet Model – from Github (Keras implementation) – versioning error


2. PHOCNet model original by sudhoult , Caffe implementation also has pytorch implementation.
Working of it gives a attribute represenation in numpy file need to know how to implement
the attribute representation on this.
3. Ctrl-F-Net pytorch implementation – Model from Ph.D thesis from github.

Later need to come up with a solution as which next step to take from classification and input wise.

You might also like