Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

SENIOR CAPSTONE

PROJECT DOCUMENTATION

Recognizing Text in Image

HANNA STROHM
ST. NORBET COLLEGE | CSCI 460 | 2019
Table of Contents
Project Definition ……………………………………………………………………… 2
How-To
Install ………………………………………………………………………………. 3
Use ………………………………………………………………………………….. 4
Exceptions ……………………..………………………………………………………... 5
Program Structure ………………...………………………………………………... 6
Data Flow …………………………………………………………………….…………… 8

1
Project Definition
Abstract

The purpose of this project was to create an application that is able to analyze an image and
display the text within the image. For example, it can find the text on a street sign or license
plate and display it as editable text, or it could read a screenshot that you would like to edit.
This application transforms any image into a format that is more easily read by the computer,
which then looks for and displays the characters found in the image.

Description & Requirements


Project Description: Write an optical character recognition application that identifies and
recognizes printed text within an image

General Description and Requirements:


1. Investigate existing algorithms and libraries.
2. Initially, try black text on a white background.
3. Design a uniform API so that you can plug in alternative OCR functions.
4. Evaluate the effectiveness of your OCR compared to existing algorithms.
5. Develop an application that employs augmented reality for text within an image
(e.g. geo-tag state park signs, license plates, campus building signs, ..)

What it Became
Towards the beginning of the project, I was given approval to use Tesseract, an open source
OCR. This means that I did not actually write my own OCR, originally one of the main parts of
the project. Instead, this became a project about processing the image before handing it over
to tesseract. I worked two of the main issues that tesseract suffers: extra noise in the image,
and color. This project became making Tesseract work for me.

2
How To Install & Run
Download and Run
1. Download the Capstone.exe file
2. Open the file and run

OR

1. Download the zipped file named Capstone


2. Extract all files
3. Open the folder and go to Capstone/bin/Debug
4. Open the Capstone.exe file

Access Project Files


1. Download the zipped file named Capstone
2. Extract all files
3. Open the folder and select Capstone.sln

3
How To Use Application
Directions for Use
1. Open application following the directions on the previous page.
2. There is a button labeled directions with a short explanation of use.
3. To start, upload an image by clicking the “Upload” button.
4. Crop the photo around the words, if needed.
a. This can be done by clicking and dragging. Start at the top left corner and drag to
the bottom right corner.
b. The goal of this is to minimize the amount of noise in the image, so it helps to get
as tight as possible around the text.
5. Choose between the “Black and White” and “Inverse Black and White” radio buttons.
a. The goal is to get black text on a white background, so for text that is darker than
its surroundings, “Black and White” should be chosen. If the text is lighter than
its surroundings, “Inverse Black and White” should be chosen.
6. Click the “Analyze” button.
a. This passes the image to tesseract, which will return text.
7. The text will appear in the textbox to the right of the image.
8. If desired or needed, edit the text or copy and paste it.
9. Repeat as desired

4
Exceptions
1. The biggest exception to my project was to write an OCR. I used tesseract instead. This
allowed me to focus on making tesseract more accurate. Tesseract works by taking a
black and white image. It finds possible text and keeps track of it as blobs. It then is able
to find lines, and then words, by analyzing the spacing between the blobs. After it finds
what it thinks are possible words, it goes through and analyzes each character using a
neural network. It does two passes of this so that it can learn from the document as it
goes through. The diagram below is an overview of this process.

Connected
Find Text
Thresholding Component
Lines/Words
Analysis

Recognize Recognize
Words Words
(Pass One) (Pass Two)

2. The other requirement that I did not get through was integrating VR into my application.
This is because I did not get to it in time. If I would have had more time, I would have
either run a google search on the text found or allowed for the user to take a photo
from the app. Something that I think is really cool is the google translate app. If I had
done my project as a mobile app, I would have tried to do that.

5
Program Structure

Main
Window Pixel Class
Class
UploadImageClick() Crop()

AnalyzeClick() Bitmap2BitmapImage()

DirectionsClick() BitmapImage2Bitmap()

MainWindow_PreviewMouseDown() BlackWhiteBitmap()

MainWindow_PreviewMouseUp() BlackWhiteInvBitmap()

BitmapImage2Bitmap() GrayOut()

Pixel Class returns imgUpload


as a BitmapSource

6
UploadImageClick()
This uploads the image by opening a dialog box.

AnalyzeClick()
This analyzes the image by calling GrayOut(), which creates a grayscale image, then by calling
BlackWhiteBitmap() or BlackWhiteInvBitmap() it turns the image black and white. Then it sends
tesseract the image.

DirectionsClick()
This displays brief instructions for use.

MainWindow_PreviewMouseDown()
This captures the location of the mouse when it is clicked.

MainWindow_PreviewMouseUp()
This captures the location of the mouse when the mouse is released. Then it calls Crop(), which
crops the image given the coordinates

BitmapImage2Bitmap()
This turns a BitmapSource to a Bitmap.

Bitmap2BitmapImage()
This turns a Bitmap into a BitmapSource.

Crop()
This crops the image given the coordinates.

BlackWhiteBitmap()
This turns the image black and white (use on image with dark text on light background)

BlackWhiteInvBitmap()
This turns the image black and white (use on image with light text on dark background)

GrayOut()
This turns the image grayscale.

7
Data Flow
Image Type

Bitmap
Image
Source

Crop Cropped
(optional) Bitmap

Format
GrayOut() Converted
Bitmap

BlackWhite BlackWhite Bitmap


Bitmap() InvBitmap() Source

Tesseract Bitmap

You might also like