Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 49

MAJOR PROJECT PRESENTATION

Submitted By-
Guided By- Utkarsh-1704077
Proff. Vinod Jha Kshitij Nishant-1704113
OPTICAL CHARACTER
RECOGNITION
AND
IMPLEMENTATION OF gTTS
CONTENT-
• INTRODUCTION
• SOFTWARE USED
• BLOCK DIAGRAM
• OCR
• TYPES
• TESSERACT
• GTTS
• LIMITATION
INTRODUCTION-
• Image acquisition, recognition and speech conversion using Optical Character
Recognition (OCR) and Text to Speech synthesizer (TTS).
• An ImageProcessing Technology used to convert the image containing
horizontal text into text documents and the extracted text is converted into
speech.
• Text-to-speech device consists of two main modules, the image processing
module and voice processing modules.
• Image processing module captures image using camera, converting the image
into text.
• Voice processing module changes the text into sound and processes it with
specific physical characteristics so that the sound can be understood.
• OCR or Optical Character Recognition is a technology that automatically recognize the character
through the optical mechanism, this technology imitate the ability of the human senses of sight, where
the camera becomes a replacement for eye and image processing is done in the computer engine as a
substitute for the human brain.
• In this project we are identifying English alphabets. Before feeding the image to the OCR, it is
converted to a binary image to increase the recognition accuracy. Image binary conversion is done by
using Imagemagick software, which is another open source tool for image manipulation.
• The output of OCR is the text, which is stored in a file (speech.txt) .Machines still have defects such as
distortion at the edges and dim light effect, so it is still difficult for most OCR engines to get high
accuracy text.
• GTTS software is used to convert the text to speech.GTTS is an open source texttospeech (TTS) system,
which is available in many languages. In this project, English TTS system is used for reading the text. 
• Further the whole process will be done by using some software,OCR,it's types along with Tesseract and
Google text to speech.
Software Used-
• PYCHARM
• TESSERACT
• GTTS
BLOCK DIAGRAM-
 In the figure it shows the
block diagram of Text-To-
Speech device, 1st block
is image processing
module, where OCR
converts .jpg to .txt form.
2nd is voice processing
module which converts
.txt to speech. 
OCR-

 Stands for Optical Character Recognition or Optical


Character Reader (OCR).
 Optical character recognition or optical character is
electronic or mechanical conversion of images of
typed, handwritten or printed text into machine-
encoded text, whether from a scanned document, a
photo of a document, a scene-photo for example
the text on signs and billboards or from subtitle text
super imposed on an image.Widely used as a form
of data entry from printed paper data records –
whether passport documents, invoices, bank
statements, computerized receipts, business cards,
mail, printouts of static-data, or any suitable
documentation.
• It is a common method of digitizing printed texts so that they can be
electronically edited, searched, stored more compactly, displayed on-
line, and used in machine processes such as cognitive
computing, machine translation, extracted i.e. text-to-speech, key
data and text mining.
• OCR is a field of research in pattern recognition, artificial
intelligence and computer vision.
• Early versions needed to be trained with images of each character,
and worked on one font at a time.
• Advanced systems capable of producing a high degree of recognition
accuracy for most fonts are now common, and with support for a
variety of digital image file format inputs.
• Early optical character recognition may be traced
to technologies involving telegraphy and creating
reading devices for the blind.
• In 1914, Emanuel Goldberg developed a machine
that read characters and converted them into
standard telegraph code.
• Concurrently, Edmund Fournier d'Albe developed
the Optophone, a handheld scanner that when
moved across a printed page, produced tones that
corresponded to specific letters or characters.
Applications-
• Data entry for business documents,check,
passport, invoice, bank statement and
receipt.
• In airports,for passport recognition
and information extraction.
• Automatic insurance documents key
information extraction.
• Extracting business card information into a
contact list.
• More quickly make textual versions of
printed documents.
• Make electronic images of printed
documents searchable.
Some of The Types-

1. Optical word recognition – targets typewritten text, one word at a time (for
languages that use a space as a word divider). (Usually just called "OCR".)
2. Intelligent character recognition (ICR) – also targets
handwritten printscript or cursive text one glyph or character at a time,
usually involving machine learning.
3. Intelligent word recognition (IWR) – also targets
handwritten printscript or cursive text, one word at a time. This is
especially useful for languages where glyphs are not separated in cursive
script.
TEXT RECOGNITION-
Two Types-
1. Pattern recognition
2. Adaptive recognition
1.PATTERN RECOGNITION-Pattern recognition is the
process of recognizing patterns by using machine learning
algorithm. Pattern recognition can be defined as the
classification of data based on knowledge already gained or on
statistical information extracted from patterns and/or their
representation

2. ADAPTIVE RECOGNITION-Software such a Tesseract use


a two-pass approach to character recognition. The second pass
is known as "adaptive recognition" and uses the letter shapes
recognized with high confidence on the first pass to recognize
better the remaining letters on the second pass. This is
advantageous for unusual fonts or low-quality scans where the
font is distorted (e.g. blurred or faded).
TESSERACT-

• Tesseract is an optical character recognition engine for various operating systems.It is free


software, released under the Apache License.This package contains an OCR
engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural
net (LSTM) based OCR engine which is focused on line recognition, but also still supports the
legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns.
Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It
also needs trained data files which support the legacy engine.
Invocation to OCR an image-
• This uses English as the default language and as the Page Segmentation Mode.
The default output format is text.
• osd.traineddata, for Orientation and Segmentation and eng.traineddata and
other language data files for English should be in the “tessdata” directory.
• TESSDATA_PREFIX environment variable should be set to the parent directory
of “tessdata” directory.
• The following command would give the same result as above, if
eng.traineddata and osd.traineddata files are in /usr/share/tessdata directory.

• tesseract --tessdata-dir /usr/share imagename outputbase -l eng --psm 3


The below Image will be converted into Text-

Code:-
tesseract testing/eurotext.png testing/eurotext-

eng -l eng
OUTPUT-
RECOGNITION OF CHARACTER IN A IMAGE-

• Tesseract OCR can support 149


languages.
• The output can be different
based on the order of languages,
so -l eng+hin can give different
result than -l hin+eng.
Algorithm to convert image(colour) to gray scale image-

• A digital image with M width (row) and N height (column) is represented as discrete function f(x, y)
as:
• f(x,y)=(xi,yj), where i = 0, i < N,j = 0,j <M
• Here the pair (xi,yi) is known as pixel. The pair (0,0) is the first pixel and pair (M-1,N-1) is the latest
pixel in the image. Every pixel has its own RGB colour value. If the pixel has the same RGB value then
it falls into gray colour family.So based on this the algorithm to convert colour image to gray is
developed:
• µ(x,y)= ∑((x,y)r,(x,y)g,(x,y)b)/3
RECOGNITION OF ENGLISH AND HINDI CHARACTER IN A IMAGE-

CODE:-
• tesseract testing/bilingual.jpg
testing/bilingual-enghin -l
eng+hin

• This will use english as primary


language and then use hindi as a
secondary language.
Advantages and Disadvantages Of TESSERACT-

• In addition Tesseract can be trained to work in other languages. Tesseract can


process right-to-left text such as Arabic or Hebrew, many Indic scripts as well
as CJK quite well.
• Tesseract is suitable for use as a backend and can be used for more complicated
OCR tasks including layout analysis by using a frontend such as OCRopus.
• Tesseract's output will have very poor quality if the input images are not
preprocessed.
• Images (especially screenshots) must be scaled up such that the text x-height is at
least 20 pixels,any rotation or skew must be corrected or no text will be
recognized, low-frequency changes in brightness must be high-pass filtered, or
Tesseract's binarization stage will destroy much of the page, and dark borders
must be manually removed, or they will be misinterpreted as characters.
GTTS-GOOGLE TEXT-TO-SPEECH
• Stands For GOOGLE TEXT TO SPEECH
• Google Text-to-Speech is a screen reader application developed
by Google for its Android operating system. It powers applications to
read aloud (speak) the text on the screen with support for many
languages. Text-to-Speech may be used by apps such as Google Play
Books for reading books aloud, by Google Translate for reading aloud
translations providing useful insight to the pronunciation of words,
by Google Talkback and other spoken feedback accessibility-based
applications, as well as by third-party apps. Users must install voice
data for each language.
• GTTS (Google Text-to-Speech), a Python library and CLI tool to
interface with Google Translate's text-to-speech API.
Write,spoken mp3 data to a file, a file-like object (bytestring) for
further audio manipulation, or stdout. Or simply pre-generate Google
Translate TTS request URLs to feed to an external program.
CONVERT TEXT TO SPEECH IN PYTHON-
• There are several APIs available to convert
text to speech in python. One of such APIs
is the Google Text to Speech API commonly
known as the gTTS API. gTTS is a very easy
to use tool which converts the text
entered, into audio which can be saved as
a mp3 file.
• The gTTS API supports several languages
including English, Hindi, Tamil, French,
German and many more. The speech can
be delivered in any one of the two
available audio speeds, fast or slow.
However, as of the latest update, it is not
possible to change the voice of the
generated audio.
Insight on PyCharm-
• PyCharm is an integrated development
environment (IDE) used in computer
programming, specifically for
the Python language. It is developed by
the Czech company JetBrains.It provides code
analysis, a graphical debugger, an integrated
unit tester, integration with version control
systems (VCSes), and supports web
development with Django as well as data
science with Anaconda.
• PyCharm is cross-platform,
with Windows, macOS and Linux versions.
The Community Edition is released under
the Apache License, and there is also
Professional Edition with extra features –
released under a proprietary license.
CONVERT TEXT TO SPEECH IN PYTHON-

INSTALLATION OF gTTS-$ pip install gTTS


PYCHARM-Coding will be done in this platform.
Whole process to convert IMAGE-TO-TEXT-TO-SPEECH:-
1. Open PYCHARM and write the codes.
2. Select Image which you want to convert.
3. Paste the directory adress in pycharm then print (text)
4. For mp3-t2s=gTTS(text,lang='en') then t2s.save('some name.mp3')

AFTER COMPLETION OF INSTALLATION WE WILL WRITE A SAMPLE PROGRAM THAT


WILL CONVERT TEXT TO AUDIO FILES.
CODE FOR IMAGE 1-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
CODE FOR IMAGE 2-
IMAGE 1 SELECTED-utt.png
IMAGE 2 SELECTED-utt.png
Image Directory Address-

-C:\user\KIIT\PycharmProjects\ocr
Output of IMAGE into TEXT-
For gTTS-

-t2s=gTTS(text,lang='en')
-t2s.save('sample.mp3')
Output of MP3 File running in VLC-
Output of MP3 File running in VLC-
ALL MP3 Audio Attachment
of converted Image to Text
and then to Audio
Conclusion
In conclusion, OCR is a very remarkable technology that holds a lot of potential. In this
day and age, such tools are already quite advanced. However, Optical Character
Recognition is going to look even better in the future. AI is on the way to becoming one
of the most influential trends in the coming years, revolutionizing information as we
know it.
OCR (Optical Character Recognition) is a technology that helps digitize physical
documents. It turns images, handwritten content, and printed documents into fully
searchable digital files. OCR is a great example of how AI solutions are driving database
modernization, as these tools are becoming increasingly affordable and accessible.
THANK YOU...

You might also like