Data Extraction From Hand Filled Forms Using Ocr

DATA EXTRACTION FROM HAND
FILLED FORMS USING OCR
Jamuna J (20BCC016) Parvathi P

Pavithrani S (20BCC027) Assistant Professor
II B.Sc. Computer Science with Department of Computer Science
Cognitive Systems with Cognitive Systems
1
Contents
 OCR
 RPA OCR
 Flowchart of Proposed work
 Proposed Methodology
 Limitations
 Future work
 References
Department of Cognitive Systems , PSGR Krishnammal College for Women 2

Optical Character Recognition
 A well-educated person can easily glance at a piece of paper and read its
contents, but having a computer do the same is far more difficult than most
people believe.
 To identify each individual letter, one must first have a digital image of the
text, process it to remove extraneous information, and then use a computer to
locate and segment the characters.
 Only then will it be able to generate a series of machine-readable characters
as an output[1].
 This procedure is known as optical character recognition (OCR).

RPA OCR
 Robotic Process Automation (RPA) is a cutting-edge technology in the fields of
computer science that uses softbots to automate manual tasks.
 RPA Tools are pieces of software that allow you to set up jobs to be automated.
The leading tools are UiPath, Automation Anywhere, Blue Prism[2].
 Optical character recognition (OCR) is a key feature of any good robotic process
automation (RPA) solution[3].
 The time-consuming activities involved with manually turning these invoices into
legible data can be automated using an OCR engine that works alongside and
within the RPA platform.
 A practical example of an RPA OCR use case might be extracting information from
a scanned customer insurance claim form .
Flowchart of Proposed Work
Figure 1. Flowchart of Proposed Work

Proposed Methodology
Figure 2. Handwritten Form

STEPS
 Step-1: Open UiPath studio and create a project with a name and
description.
 Step-2: Click on open main workflow.
 Step 3: Import both Uipath.documentunderstanding.ML.Activities and
Uipath.IntelligentOCR.Activities packages from manage packages.
 Step 4: Drag and drop the Sequence and Load Taxonomy activities to the
main workflow window and create Taxonomy variable.

 Step 5: Open Taxonomy Manager and fill the name of group, category,
document type and fields you want to create as shown in Figure.3 .
Figure 3. Taxonomy Manager

 Step 6: Drag the Digitize Document activity and create appropriate
variables. Also add UiPath Document OCR to it .
 Step 7: In the UiPath Document OCR’s property box, add the API key
copied from your UiPath website.
 Step 8: Drag the Data Extraction Scope activity and copy the Document
Type Id from the File Explorer in project tab and paste it.
 Step 9: Also add the Intelligent Form Extractor activity and paste the same
API key.

 Step 10: Click on Manage Template then click create template and in the create
template wizard add all the details. Click Configure.
Figure 4. Workflow

 Step 11: Add both Export Extraction Result and For Each activities to
the main workflow.
 Step 12: Within For Each activity, add Work Range activity and name
your file with extension.
 Step 13: Set the path of your hand written form as default in the
Document Path variable.
 Step 14: Click on Debug file and click run.
 Step 15: In the project tab under Document Processing you can find the
created Excel sheet.

VALIDATION STATION
Figure 5. Validation Station

Result
Figure 6. Excel Sheet

LIMITATIONS
 OCR lacks to understand three most crucial aspects of data processing:
 Formatting
 Content
 Context
 More importantly, If the original document is of poor quality or the
handwriting is difficult to read, more mistakes will occur[4].
 For instance ,the handwritten form that we scanned had some errors like
the last letter in Policy number was scanned as 2 instead of the letter Z.
 These errors has to be corrected manually.

FUTURE WORK
 Artificial Intelligence (AI) has a ability to have a spatial understanding of a
given document which means it understands what to look for in the document
and where exactly to find it, even if the position changes, similar to how a
human would.
 Basically AI works irrespective of the template & delivers a more accurate
result.
 Combining AI with OCR is proving to be a successful data collecting and
management method.
 While AI-based OCR solutions may not be as flashy as other transformational
technologies, they will undoubtedly have a significant influence on the bottom
line of businesses who use them[5].

REFERENCES
[1] - K. A. Barchard and L. A. Pace, “Preventing human error: The impact of data entry
methods on data accuracy and statistical results,” Comput. Human Behav., vol. 27, no. 5, pp.
1834–1839, 2011, doi: 10.1016/j.chb.2011.04.004.
[2] - https://www.edureka.co/blog/what-is-robotic-process-automation/
[3] - https://www.nice.com/guide/rpa/rpa-ocr-elevating-process-automation
[4] - https://medium.com/@CereLabs/the-technology-that-is-better-than-ocr-354e989cb270
[5] - https://www.information-age.com/optical-character-recognition-tools-ocr-ai-123479324/

QUERIES ?


Data Extraction From Hand Filled Forms Using Ocr

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Extraction From Hand Filled Forms Using Ocr

Uploaded by

Copyright:

Available Formats

DATA EXTRACTION FROM HAND

FILLED FORMS USING OCR

Jamuna J (20BCC016) Parvathi P

Department of Cognitive Systems , PSGR Krishnammal College for Women 2

Department of Cognitive Systems , PSGR Krishnammal College for Women 3

Figure 1. Flowchart of Proposed Work

Department of Cognitive Systems , PSGR Krishnammal College for Women 5

Figure 2. Handwritten Form

Department of Cognitive Systems , PSGR Krishnammal College for Women 6

Department of Cognitive Systems , PSGR Krishnammal College for Women 7

Figure 3. Taxonomy Manager

Department of Cognitive Systems , PSGR Krishnammal College for Women 8

Department of Cognitive Systems , PSGR Krishnammal College for Women 9

Department of Cognitive Systems , PSGR Krishnammal College for Women 10

Department of Cognitive Systems , PSGR Krishnammal College for Women 11

Figure 5. Validation Station

Department of Cognitive Systems , PSGR Krishnammal College for Women 12

Figure 6. Excel Sheet

Department of Cognitive Systems , PSGR Krishnammal College for Women 13

Department of Cognitive Systems , PSGR Krishnammal College for Women 14

Department of Cognitive Systems , PSGR Krishnammal College for Women 15

Department of Cognitive Systems , PSGR Krishnammal College for Women 16

Department of Cognitive Systems , PSGR Krishnammal College for Women 17

You might also like