Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

2018 International Conference on Signal Processing and Information Security (ICSPIS)

A Business Card Reader Application for iOS devices


based on Tesseract
Bello Ahmed Dangiwa Smitha S Kumar
School of Mathematical and Computer Science. School of Mathematical and Computer Science.
Heriot Watt University. Heriot Watt University.
bd2@hw.ac.uk Smitha.kumar@hw.ac.uk

The proposed method starts off with a business card image


Abstract—As the accessibility of high-resolution smartphone captured using a mobile phone camera. In the image processing
camera has increased and an improved computational speed, it is section characters, words and lines are spotted from the image.
now convenient to build Business Card Readers on mobile phones. Then, the optical character recognition unit is used to recognize
The project aims to design and develop a Business Card Reader the identified characters. Finally, characters and texts are
(BCR) Application for iOS devices, using an open-source OCR
Engine – Tesseract. The system accuracy was tested and evaluated
classified and important information from the business card are
using a dataset of 55 digital business cards obtained from an online extracted. Such as name, address, phone number, etc.
repository. The accuracy result of the system was up to 74% in To assess the application, experiments were carried out using
terms of both text recognition and data detection. A comparative a database of 400 images from 100 various business cards. The
analysis was carried out against a commercial business card
reader application and our application performed vastly average time taken to recognize a single image was
reasonable. approximately 2.5 seconds in a 200MHz ARM CPU. The
recognition rate of symbols and single characters was
Keywords—Optical Character Recognition, iOS, Image approximately 98.4%.
Processing, Tesseract OCR, Natural Language Processing
In conclusion, the proposed system still faces challenges on
segmentation of characters in blurred images. Detecting text
regions in complex background still remains a challenge. Time
I. INTRODUCTION taken to recognize one image should be considered poor.
Business cards are widely used on a daily basis by
R. Jana et al. [2] explores another method of effective optical
professionals. In professional meetings/gatherings, business
character recognition based on texture and topological features
cards are exchanged as formal greetings and as a way of building
of text image. For improved performance, texture and
networking. The number of business cards will excessively
topological features of text image such as corner points, features
grow, which makes managing or communicating to card owners
of different regions and character area to convex area ratio are
an issue. The best way to manage and ensure communication is
calculated. Character verification is carried out by comparing the
by digitizing it. Numerous researchers proposed and built
extracted character with the template of all character for a
systems using flatbed card scanners to manage and digitalize the
similarity measure check.
business cards. A downside to this approach is the lack of
portability of flatbed scanners. As the accessibility of high- Two datasets are considered, the training dataset and the test
resolution cameras, low price and better computation speed in dataset. Preprocessing and feature extraction is carried out in
smartphones no longer possess concerns and also with the both cases. Extracted features from test data are compared with
significant growth in usage of mobile phones. It becomes more training data features to get desired outputs.
convenient to use smartphones for business card digitalization.
The project largely used iOS frameworks and libraries to In preprocessing stage, the text image gets converted into a
achieve a fully functioning application. The project targets to binary image to be processed. It is justified to be easier to work
provide a BCR application that will resolve challenges faced with pixel values 0 and 1 of the binary image. Binary 1 (one)
with business cards usage by providing a digitized solution. The embodies the letters and binary 0 (zero) embodies the
application offers users the ability to scan business cards, extract background. At this instant, individual text lines gets separated
details and save to phone contacts. It is built for iOS devices in from the binary image by calculating sum of all values in a row.
XCode v9.2 with Swift Programming Language. When sum is zero, new text line is recognized and separation is
done. After all text lines have been extracted, individual letters
are to be extracted. Letters are extracted by calculating the sum
of all values in a column. When sum is 0, individual characters
II. LITERATURE REVIEW are identified and separated. In this manner, all individual
L. Xi-Ping et al. [1] presents the design and execution of a characters get separated (including digits, characters and
business card reader application using a built in mobile phone punctuations).
camera. Another technique is proposed that centers on multi-
In the Feature extraction stage, Harris Corner method is
resolution analysis of document images, with an aim to solve the
used. The character image corner points are calculated. Then the
challenge faced in restricted resources of mobile devices. The
total area of extracted character image gets calculated with the
proposed technique will aid in improving computational speed
exact number of pixels in character image. Using number of
and lessens memory usage in the image processing stage by
pixels, the convex area of the character is computed. The result
spotting the text regions in downscaled images and then
is the ratio of convex area to total area, and all individual
analyzing spotted regions in the original image.
characters are extracted.

Authorized licensed use limited to: GITAM University. Downloaded on February 29,2024 at 12:35:01 UTC from IEEE Xplore. Restrictions apply.
978-1-7281-0257-3/18/$31.00 ©2018 IEEE
2018 International Conference on Signal Processing and Information Security (ICSPIS)

Result = Convex Area / Total Area iterative increments. After each sprint completion, the
functionality was tested and required improvements were made.
The paper uses three fonts (Arial, Lucida Fax and Berlin The Development of the application followed the object-
Sans) as the training data set. Using the training data set features oriented programming principles with classes built to be easily
for extracted character recognition. The experimental results for
extended or reused.
fonts in the training data set are; Arial = 87.14% accuracy, Berlin
Sans = 94.28%, Lucida Fax = 64.28%, Cambria = 14.28% and A. Technologies used
Times New Roman = 21.42% (Results for a case using 70 1) UI Kit Framework
characters). UI Kit framework provides the architecture for
The accuracy for fonts not in the training dataset appears to implementing interfaces, event handling, etc. Interaction
be very poor. To achieve higher accuracy, fonts have to be used between user, system and application is also managed by this
in the training dataset. This makes the proposed method not very framework. The development started by connecting and
scalable. defining all the components in the UI to the View Controller
classes for implementation. These includes the flash button,
Reference [2] uses fonts as a training data set for character guide button, labels, the camera capture button, text fields, etc.
recognition which is not an effective method to be implemented
in a mobile phone due to memory limitations. It also [2] 2) AV Foundation Framework
indicates the use of texture and topological feature such as AV foundation is a framework that allows configuration of
corner points of the image to enhance performance in the built-in iPhone cameras for capturing, recording
extraction stage. To compare [2] to [1], both papers are and processing of photos, videos and audio. We explicitly setup
seemingly using Optical Character Recognition for character this framework to capture photos (scan
extraction. However, it can be argued that [1] will be more business cards) and additional support for rear flash camera
effective for mobile phone implementation as it requires lesser usage only.
memory in the image processing stage.
3) Core Image Framework
[3] and [4] suggests the use of different methods for reading Core Image is an image processing framework offered by
business cards. [3] propose the use of augmented reality in Apple that delivers high performance for
business card readers. The business cards images are used as processing still images and videos. It provides numerous built-
image markers in order to retrieve business card information in features for image enhancement and
from a cloud database. [4] proposes the use of Multi-Color processing. This framework is used as the image pre-processing
Composite (MMCC) barcode. The barcode is used to store the module in the application.
digital version of the business card, which can be retrieve by
scanning the barcode. The above two proposed methods are 4) Tesseract OCR Engine – Text Recognition
debatably not the best of approaches for business card readers Ray Smith [10] from Google Inc. provided an overview on
due to their lack of flexibility and their limitations. Hence, this the open-source Tesseract OCR Engine. He focused on the
project will focus on using Optical Character Recognition for the unique aspects of the OCR engine, these includes line finding,
Business Card Reader to be developed. the classification methods, and an adaptive classifier.
[5] use a similar approach to [2] in using the four corner A step-by-step pipeline process followed in Tesseract OCR
points of the business cards in extraction stage for better Engine (Fig. 1). The first step is the connected component
performance. Conversely, [2] appears to have provided a more analysis, it inspects nested outlines with the number of
detailed information on processes taken to achieve its extraction. child/grandchildren outlines and stores outlines of the
The four corner points technique will be looked into for possible components. This makes it easy to detect inverse text and detect
implementation in this project. white texts on black background. Stored outlines are grouped
The proposed method in [6] presents a new noise and picture into blobs. Blobs are ordered into text lines, and then regions are
region filtering technique – using the Connected Component examined for proportional texts. Text lines are further broken
(CC). into words based on character spacing.

[7] proposes to develop an android application that allows The second step of the process starts by attempting to
mobile phones to capture text-based image, extract text from the recognize single words in turns. Acceptable recognized words
image, translate it into English and finally speech it out. The are passed to the adaptive classifier to be used as training data.
application is developed with the Tesseract OCR Engine, Bing The Adaptive classifier uses the accepted words to improve
translator and smartphones’ built-in speech out technology. accuracy in recognizing words in the lower part of the page. As
the adaptive classifier might have learned and improved in
As seen in [8], the binarization process used may be better recognizing words on the page, a second pass is run again to
than the other techniques used in other papers. A neural network improve recognition of words on the near top of the page. The
was used in character classification, the method also improved final step to remove fuzzy spaces is taken.
the classification accuracy. After the above review, an image
processing module shall be added to the system to enhance
accuracy of the OCR engine.

III. METHODOLOGY
Implementation of system functionalities were completed
following the Agile – Scrum Methodology. System
requirements and functionalities were accomplished in sprints –
Fig. 1. Tesseract OCR Architecture (Simplified)

Authorized licensed use limited to: GITAM University. Downloaded on February 29,2024 at 12:35:01 UTC from IEEE Xplore. Restrictions apply.
2018 International Conference on Signal Processing and Information Security (ICSPIS)

B. Text Recognition – Tesseract OCR Engine


5) Natural Language Processing API The pre-processed (Final) image is then passed into the OCR
Natural Language Processing API uses machine learning to Engine – Tesseract for text recognition. Tesseract OCR is added
understand texts with features like language identification, into the project as a framework. The framework uses the English
named entity recognition, lemmatization etc. The API was used trained datasets provided by Tesseract for text recognition.
for processing recognized text from business cards. Using the tessractCubeCombined engine mode as it delivers the
6) Contacts framework best accuracy in Tesseract. The passed image is further
Contacts Framework is used to access user’s contacts. We processed using the in-built Tesseract image processing module.
used the framework to add new contact information into the The image is segmented and binarized (black and white), the
user’s contacts. binarized image is then handed for text recognition. The
recognized text is stored in NSString which will be used in the
next phase of the application. NSString is a plain-text Unicode
object that supports comparing, searching and modification of
IV. IMPLEMETATION strings.
This section covers the detailed processes as well as
decisions taken to complete the development of the proposed
application. The application is developed in XCode IDE v9.2 C. Natural Language Processing
with Apple’s Programming Language – Swift 4. After Tesseract OCR recognizes the text, first and last names
The main camera screen was completed first as it is the first needed to be recognized since Natural Language Processing API
functional requirement of the system. This functionality allows didn’t have support for this. A separate method was created for
the user to scan a business card, the scanned card will be passed checking first and last names. The recognized text is passed to
to the next controller (screen) to be processed. this method. A character set was initialized containing all the
English alphabets (both lower and upper case). The set of
Capture session (Fig. 2) is the most essential component of characters was used to eliminate any other unwanted characters
AV Foundation for capturing photos or videos. In AV such as numbers or signs as it not commonly-found in names.
Foundation, a CaptureSession object is used to handle sessions Afterwards, lines containing two words separated by a single
and also manage the flow of data in physical devices – as inputs. space were checked. The reason behind this, is that a lot of
The capture session was setup as photo preset in order to acquire business cards contain only the first and last name in this manner
the highest resolution quality photo (in output). Using Discovery as the aim is for a larger set of business cards to be fully
Session, the device was configured to use the back-facing recognized. Finally, white spaces are trimmed and retrieved
camera with built-in Wide-Angle Camera settings. A codec type names are added to their respective text fields. To check for
of jpeg used to maintain a high resolution with low memory email address, Regular Expressions were used to identify email
usage. strings from the recognized text and adds the email to its text
After successfully setting up camera for capture sessions. A field.
capture button is set to use the AVCapturePhotoSettings object. Then the Natural Language Processing API was used to
When a picture is taken, the output is passed to the identify phone number, link and address. This uses the
AVCapturePhotoOutput method to be further processed down NSDataDetector class to match the recognized text with
the application. The diagram below depicts the flow and predefined data patterns for phone number, links and address.
relations in the components: Finally, all the identified entities are automatically populated to
Devices their respective text fields for the user.

Input
D. Adding to Contact
Capture Session
The contacts framework is used in this function to save
Output
contact information into phone contacts. The business card
information filled in the text fields are automatically added into
Fig. 2. Relations between Components of AV Foundation (Camera Usage) the phone contacts. It is connected to a UI button which on click
triggers the CNSaveRequest and stores the new contact.

A. Image-Preprocessing Module E. Search for Social Network accounts


After receiving the captured image from the view controller This was implemented by connecting each social network
(camera), it is passed to the Core Image Framework to be search function to a UI button. User can search for full name of
processed and enhanced. The first step taken in the the card owner in Facebook, LinkedIn and Twitter by tapping
preprocessing module is to detect rectangular features in the the corresponding button. It uses an NSURL class object to
image. Using the CIRectangleFeature object, the image is access the URL of the social networks, with full name used as
scanned for rectangular areas and information about identified a search query.
regions is returned. Once the features are acquired, perspective
correction is applied, this uses the identified rectangular features The diagram below (Fig. 3), depicts the stages and modules
to add CGPoint on the corners of the detected rectangle (Bottom of the system from scanning a business card to successfully
Right, Bottom Left, Top Left and Top Right). CGPoint saving retrieved contact details. These stages collectively make
structures store X and Y value properties. The image orientation up the whole system.
is then set and scaled. The final phase is cropping the image to
gain only the rectangular card.
Authorized licensed use limited to: GITAM University. Downloaded on February 29,2024 at 12:35:01 UTC from IEEE Xplore. Restrictions apply.
2018 International Conference on Signal Processing and Information Security (ICSPIS)

Fig. 3. System Architecture (Simplified)


Fig. 4. AETHER BCR (This Project) (Comparative Analysis)

Figure 5 displays the accuracy of each retrieved element for


V. EVALUATION ABBY BCR (Commercial Application).
The system was tested and evaluated for efficiency, efficacy
and performance. The system was evaluated in two different
phases; theoretical and experimental assessments. The tests and
evaluations conducted are; User Interface and Usability, System
Accuracy Testing and a comparative analysis with a commercial
BCR application.
Overall, the usability feedback suggests the application is
very easy to use, all functionalities worked accurately, and users
were satisfied. The system accuracy was tested and evaluated
using a dataset of 55 digital business cards obtained from an
online repository [9]. The accuracy result of the system was up
to 74% in terms of text recognition and data detection combined.
However, this application is built for best use with physical Fig. 5. ABBY BCR (Commercial Application) (Comparative Analysis)
business cards. And the fact that, quality of some of the pictures
used from the dataset appeared to be poor.
In a comparative analysis with a commercial BCR
REFERENCES
application. We tested both applications in the same office
lighting conditions against 15 physical business cards. We [1] L. Xi-Ping, L. Jun, and Z. Li-Xin, "Design and implementation of a card
reader based on build-in camera,” vol. 1, ed. USA, 2004, pp. 417-420.
checked the accuracy of each data retrieval that is; name, phone,
[2] R. Jana, A. R. Chowdhury, and M. Islam, "Optical Character Recognition
email, address and URL. The table (Table 1) below shows the from Text Image," International Journal of Computer Applications
results (in %) acquired from each of the data. Technology and Research, vol. 3, no. 4, pp. 240-244, 2014.
[3] V. Hing and H. K. Khoo, "Business card reader with augmented reality
engine integration,” vol. 398, ed, 2017, pp. 219-227.
Data This Project ABBY BCR [4] S. K. Ong, D. Chai, and A. Rassau, "A robust mobile business card reader
using MMCC barcode," ed, 2011, pp. 656-661.
Name 53.8% 61.5% [5] G. Hua, Z. Liu, Z. Zhang, and Y. Wu, "Automatic Business Card Scanning
with a Camera," ed, 2006, pp. 373-376.
Phone 100% 92.3% [6] A. F. Mollah, S. Basu, N. Das, R. Sarkar, M. Nasipuri, and M. Kundu,
"Text Region Extraction from Business Card Images for Mobile Devices,"
Email 83.3% 100% 2010.
Address 9% 81.8% [7] S. Ramiah, T. Y. Liong, and M. Jayabalan, "Detecting text based image
with optical character recognition for English translation and speech using
URL 87.5% 75% Android," ed, 2015, pp. 272-277.
[8] K. S. Bae, K. K. Kim, Y. G. Chung, and W. P. Yu, "Character recognition
Overall Accuracy 66.52% 81.92% system for cellular phone with camera,” vol. 1, ed. USA, 2005, pp. 539-
544.
Table 1. Comparative Analysis Result [9] Purl.stanford.edu. (2011). Dataset: Stanford Mobile Visual Search
Dataset. [online] Available at: https://purl.stanford.edu/rb470rw0983
This project got an overall accuracy of 66.52% and ABBY [Accessed 4 Apr. 2018].
BCR application got a higher overall accuracy of 81.92%. Our [10] Smith R. “An overview of the Tesseract OCR engine”, InDocument
application performed highly reasonable against a commercial Analysis and Recognition, 2007, (ICDAR 2007), Ninth International
application. Conference on 2007 Sep 23, Vol. 2, pp. 629-633.

Figure 4 below shows the accuracy (in %) of each retrieved


element (name, email, phone, etc.) for this project.

Authorized licensed use limited to: GITAM University. Downloaded on February 29,2024 at 12:35:01 UTC from IEEE Xplore. Restrictions apply.

You might also like