Professional Documents
Culture Documents
Ocr Gtts
Ocr Gtts
Submitted By-
Guided By- Utkarsh-1704077
Proff. Vinod Jha Kshitij Nishant-1704113
OPTICAL CHARACTER
RECOGNITION
AND
IMPLEMENTATION OF gTTS
CONTENT-
• INTRODUCTION
• SOFTWARE USED
• BLOCK DIAGRAM
• OCR
• TYPES
• TESSERACT
• GTTS
• LIMITATION
INTRODUCTION-
• Image acquisition, recognition and speech conversion using Optical Character
Recognition (OCR) and Text to Speech synthesizer (TTS).
• An ImageProcessing Technology used to convert the image containing
horizontal text into text documents and the extracted text is converted into
speech.
• Text-to-speech device consists of two main modules, the image processing
module and voice processing modules.
• Image processing module captures image using camera, converting the image
into text.
• Voice processing module changes the text into sound and processes it with
specific physical characteristics so that the sound can be understood.
• OCR or Optical Character Recognition is a technology that automatically recognize the character
through the optical mechanism, this technology imitate the ability of the human senses of sight, where
the camera becomes a replacement for eye and image processing is done in the computer engine as a
substitute for the human brain.
• In this project we are identifying English alphabets. Before feeding the image to the OCR, it is
converted to a binary image to increase the recognition accuracy. Image binary conversion is done by
using Imagemagick software, which is another open source tool for image manipulation.
• The output of OCR is the text, which is stored in a file (speech.txt) .Machines still have defects such as
distortion at the edges and dim light effect, so it is still difficult for most OCR engines to get high
accuracy text.
• GTTS software is used to convert the text to speech.GTTS is an open source texttospeech (TTS) system,
which is available in many languages. In this project, English TTS system is used for reading the text.
• Further the whole process will be done by using some software,OCR,it's types along with Tesseract and
Google text to speech.
Software Used-
• PYCHARM
• TESSERACT
• GTTS
BLOCK DIAGRAM-
In the figure it shows the
block diagram of Text-To-
Speech device, 1st block
is image processing
module, where OCR
converts .jpg to .txt form.
2nd is voice processing
module which converts
.txt to speech.
OCR-
1. Optical word recognition – targets typewritten text, one word at a time (for
languages that use a space as a word divider). (Usually just called "OCR".)
2. Intelligent character recognition (ICR) – also targets
handwritten printscript or cursive text one glyph or character at a time,
usually involving machine learning.
3. Intelligent word recognition (IWR) – also targets
handwritten printscript or cursive text, one word at a time. This is
especially useful for languages where glyphs are not separated in cursive
script.
TEXT RECOGNITION-
Two Types-
1. Pattern recognition
2. Adaptive recognition
1.PATTERN RECOGNITION-Pattern recognition is the
process of recognizing patterns by using machine learning
algorithm. Pattern recognition can be defined as the
classification of data based on knowledge already gained or on
statistical information extracted from patterns and/or their
representation
Code:-
tesseract testing/eurotext.png testing/eurotext-
eng -l eng
OUTPUT-
RECOGNITION OF CHARACTER IN A IMAGE-
• A digital image with M width (row) and N height (column) is represented as discrete function f(x, y)
as:
• f(x,y)=(xi,yj), where i = 0, i < N,j = 0,j <M
• Here the pair (xi,yi) is known as pixel. The pair (0,0) is the first pixel and pair (M-1,N-1) is the latest
pixel in the image. Every pixel has its own RGB colour value. If the pixel has the same RGB value then
it falls into gray colour family.So based on this the algorithm to convert colour image to gray is
developed:
• µ(x,y)= ∑((x,y)r,(x,y)g,(x,y)b)/3
RECOGNITION OF ENGLISH AND HINDI CHARACTER IN A IMAGE-
CODE:-
• tesseract testing/bilingual.jpg
testing/bilingual-enghin -l
eng+hin
-C:\user\KIIT\PycharmProjects\ocr
Output of IMAGE into TEXT-
For gTTS-
-t2s=gTTS(text,lang='en')
-t2s.save('sample.mp3')
Output of MP3 File running in VLC-
Output of MP3 File running in VLC-
ALL MP3 Audio Attachment
of converted Image to Text
and then to Audio
Conclusion
In conclusion, OCR is a very remarkable technology that holds a lot of potential. In this
day and age, such tools are already quite advanced. However, Optical Character
Recognition is going to look even better in the future. AI is on the way to becoming one
of the most influential trends in the coming years, revolutionizing information as we
know it.
OCR (Optical Character Recognition) is a technology that helps digitize physical
documents. It turns images, handwritten content, and printed documents into fully
searchable digital files. OCR is a great example of how AI solutions are driving database
modernization, as these tools are becoming increasingly affordable and accessible.
THANK YOU...