Professional Documents
Culture Documents
A Business Card Reader Application For iOS Devices Based On Tesseract
A Business Card Reader Application For iOS Devices Based On Tesseract
Authorized licensed use limited to: GITAM University. Downloaded on February 29,2024 at 12:35:01 UTC from IEEE Xplore. Restrictions apply.
978-1-7281-0257-3/18/$31.00 ©2018 IEEE
2018 International Conference on Signal Processing and Information Security (ICSPIS)
Result = Convex Area / Total Area iterative increments. After each sprint completion, the
functionality was tested and required improvements were made.
The paper uses three fonts (Arial, Lucida Fax and Berlin The Development of the application followed the object-
Sans) as the training data set. Using the training data set features oriented programming principles with classes built to be easily
for extracted character recognition. The experimental results for
extended or reused.
fonts in the training data set are; Arial = 87.14% accuracy, Berlin
Sans = 94.28%, Lucida Fax = 64.28%, Cambria = 14.28% and A. Technologies used
Times New Roman = 21.42% (Results for a case using 70 1) UI Kit Framework
characters). UI Kit framework provides the architecture for
The accuracy for fonts not in the training dataset appears to implementing interfaces, event handling, etc. Interaction
be very poor. To achieve higher accuracy, fonts have to be used between user, system and application is also managed by this
in the training dataset. This makes the proposed method not very framework. The development started by connecting and
scalable. defining all the components in the UI to the View Controller
classes for implementation. These includes the flash button,
Reference [2] uses fonts as a training data set for character guide button, labels, the camera capture button, text fields, etc.
recognition which is not an effective method to be implemented
in a mobile phone due to memory limitations. It also [2] 2) AV Foundation Framework
indicates the use of texture and topological feature such as AV foundation is a framework that allows configuration of
corner points of the image to enhance performance in the built-in iPhone cameras for capturing, recording
extraction stage. To compare [2] to [1], both papers are and processing of photos, videos and audio. We explicitly setup
seemingly using Optical Character Recognition for character this framework to capture photos (scan
extraction. However, it can be argued that [1] will be more business cards) and additional support for rear flash camera
effective for mobile phone implementation as it requires lesser usage only.
memory in the image processing stage.
3) Core Image Framework
[3] and [4] suggests the use of different methods for reading Core Image is an image processing framework offered by
business cards. [3] propose the use of augmented reality in Apple that delivers high performance for
business card readers. The business cards images are used as processing still images and videos. It provides numerous built-
image markers in order to retrieve business card information in features for image enhancement and
from a cloud database. [4] proposes the use of Multi-Color processing. This framework is used as the image pre-processing
Composite (MMCC) barcode. The barcode is used to store the module in the application.
digital version of the business card, which can be retrieve by
scanning the barcode. The above two proposed methods are 4) Tesseract OCR Engine – Text Recognition
debatably not the best of approaches for business card readers Ray Smith [10] from Google Inc. provided an overview on
due to their lack of flexibility and their limitations. Hence, this the open-source Tesseract OCR Engine. He focused on the
project will focus on using Optical Character Recognition for the unique aspects of the OCR engine, these includes line finding,
Business Card Reader to be developed. the classification methods, and an adaptive classifier.
[5] use a similar approach to [2] in using the four corner A step-by-step pipeline process followed in Tesseract OCR
points of the business cards in extraction stage for better Engine (Fig. 1). The first step is the connected component
performance. Conversely, [2] appears to have provided a more analysis, it inspects nested outlines with the number of
detailed information on processes taken to achieve its extraction. child/grandchildren outlines and stores outlines of the
The four corner points technique will be looked into for possible components. This makes it easy to detect inverse text and detect
implementation in this project. white texts on black background. Stored outlines are grouped
The proposed method in [6] presents a new noise and picture into blobs. Blobs are ordered into text lines, and then regions are
region filtering technique – using the Connected Component examined for proportional texts. Text lines are further broken
(CC). into words based on character spacing.
[7] proposes to develop an android application that allows The second step of the process starts by attempting to
mobile phones to capture text-based image, extract text from the recognize single words in turns. Acceptable recognized words
image, translate it into English and finally speech it out. The are passed to the adaptive classifier to be used as training data.
application is developed with the Tesseract OCR Engine, Bing The Adaptive classifier uses the accepted words to improve
translator and smartphones’ built-in speech out technology. accuracy in recognizing words in the lower part of the page. As
the adaptive classifier might have learned and improved in
As seen in [8], the binarization process used may be better recognizing words on the page, a second pass is run again to
than the other techniques used in other papers. A neural network improve recognition of words on the near top of the page. The
was used in character classification, the method also improved final step to remove fuzzy spaces is taken.
the classification accuracy. After the above review, an image
processing module shall be added to the system to enhance
accuracy of the OCR engine.
III. METHODOLOGY
Implementation of system functionalities were completed
following the Agile – Scrum Methodology. System
requirements and functionalities were accomplished in sprints –
Fig. 1. Tesseract OCR Architecture (Simplified)
Authorized licensed use limited to: GITAM University. Downloaded on February 29,2024 at 12:35:01 UTC from IEEE Xplore. Restrictions apply.
2018 International Conference on Signal Processing and Information Security (ICSPIS)
Input
D. Adding to Contact
Capture Session
The contacts framework is used in this function to save
Output
contact information into phone contacts. The business card
information filled in the text fields are automatically added into
Fig. 2. Relations between Components of AV Foundation (Camera Usage) the phone contacts. It is connected to a UI button which on click
triggers the CNSaveRequest and stores the new contact.
Authorized licensed use limited to: GITAM University. Downloaded on February 29,2024 at 12:35:01 UTC from IEEE Xplore. Restrictions apply.