Professional Documents
Culture Documents
Senior Capstone
Senior Capstone
PROJECT DOCUMENTATION
HANNA STROHM
ST. NORBET COLLEGE | CSCI 460 | 2019
Table of Contents
Project Definition ……………………………………………………………………… 2
How-To
Install ………………………………………………………………………………. 3
Use ………………………………………………………………………………….. 4
Exceptions ……………………..………………………………………………………... 5
Program Structure ………………...………………………………………………... 6
Data Flow …………………………………………………………………….…………… 8
1
Project Definition
Abstract
The purpose of this project was to create an application that is able to analyze an image and
display the text within the image. For example, it can find the text on a street sign or license
plate and display it as editable text, or it could read a screenshot that you would like to edit.
This application transforms any image into a format that is more easily read by the computer,
which then looks for and displays the characters found in the image.
What it Became
Towards the beginning of the project, I was given approval to use Tesseract, an open source
OCR. This means that I did not actually write my own OCR, originally one of the main parts of
the project. Instead, this became a project about processing the image before handing it over
to tesseract. I worked two of the main issues that tesseract suffers: extra noise in the image,
and color. This project became making Tesseract work for me.
2
How To Install & Run
Download and Run
1. Download the Capstone.exe file
2. Open the file and run
OR
3
How To Use Application
Directions for Use
1. Open application following the directions on the previous page.
2. There is a button labeled directions with a short explanation of use.
3. To start, upload an image by clicking the “Upload” button.
4. Crop the photo around the words, if needed.
a. This can be done by clicking and dragging. Start at the top left corner and drag to
the bottom right corner.
b. The goal of this is to minimize the amount of noise in the image, so it helps to get
as tight as possible around the text.
5. Choose between the “Black and White” and “Inverse Black and White” radio buttons.
a. The goal is to get black text on a white background, so for text that is darker than
its surroundings, “Black and White” should be chosen. If the text is lighter than
its surroundings, “Inverse Black and White” should be chosen.
6. Click the “Analyze” button.
a. This passes the image to tesseract, which will return text.
7. The text will appear in the textbox to the right of the image.
8. If desired or needed, edit the text or copy and paste it.
9. Repeat as desired
4
Exceptions
1. The biggest exception to my project was to write an OCR. I used tesseract instead. This
allowed me to focus on making tesseract more accurate. Tesseract works by taking a
black and white image. It finds possible text and keeps track of it as blobs. It then is able
to find lines, and then words, by analyzing the spacing between the blobs. After it finds
what it thinks are possible words, it goes through and analyzes each character using a
neural network. It does two passes of this so that it can learn from the document as it
goes through. The diagram below is an overview of this process.
Connected
Find Text
Thresholding Component
Lines/Words
Analysis
Recognize Recognize
Words Words
(Pass One) (Pass Two)
2. The other requirement that I did not get through was integrating VR into my application.
This is because I did not get to it in time. If I would have had more time, I would have
either run a google search on the text found or allowed for the user to take a photo
from the app. Something that I think is really cool is the google translate app. If I had
done my project as a mobile app, I would have tried to do that.
5
Program Structure
Main
Window Pixel Class
Class
UploadImageClick() Crop()
AnalyzeClick() Bitmap2BitmapImage()
DirectionsClick() BitmapImage2Bitmap()
MainWindow_PreviewMouseDown() BlackWhiteBitmap()
MainWindow_PreviewMouseUp() BlackWhiteInvBitmap()
BitmapImage2Bitmap() GrayOut()
6
UploadImageClick()
This uploads the image by opening a dialog box.
AnalyzeClick()
This analyzes the image by calling GrayOut(), which creates a grayscale image, then by calling
BlackWhiteBitmap() or BlackWhiteInvBitmap() it turns the image black and white. Then it sends
tesseract the image.
DirectionsClick()
This displays brief instructions for use.
MainWindow_PreviewMouseDown()
This captures the location of the mouse when it is clicked.
MainWindow_PreviewMouseUp()
This captures the location of the mouse when the mouse is released. Then it calls Crop(), which
crops the image given the coordinates
BitmapImage2Bitmap()
This turns a BitmapSource to a Bitmap.
Bitmap2BitmapImage()
This turns a Bitmap into a BitmapSource.
Crop()
This crops the image given the coordinates.
BlackWhiteBitmap()
This turns the image black and white (use on image with dark text on light background)
BlackWhiteInvBitmap()
This turns the image black and white (use on image with light text on dark background)
GrayOut()
This turns the image grayscale.
7
Data Flow
Image Type
Bitmap
Image
Source
Crop Cropped
(optional) Bitmap
Format
GrayOut() Converted
Bitmap
Tesseract Bitmap