Welcome to Scribd!

Skip carousel

100% found this document useful (3 votes)

22K views

Tesseract Training - For Khmer Language - For Posting

Uploaded by

Kruy Vanna

This is a step by step tutorial on how to train Tesseract OCR. Here I train Khmer Language as an example.

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

HBase High Performance Cookbook
From Everand
HBase High Performance Cookbook
Ruchir Choudhry
No ratings yet
JAVA Quick Reference PDF
Document3 pages
JAVA Quick Reference PDF
Mohamed Nazim
No ratings yet
Pfsense Basic Configuration
Document19 pages
Pfsense Basic Configuration
aami6
No ratings yet
CA Unified Infrastructure Management Probes - ENU - 20160311
Document4,544 pages
CA Unified Infrastructure Management Probes - ENU - 20160311
Bimo Panji Satrio
100% (1)
Advances in Network and Distributed Systems Security PDF
Document218 pages
Advances in Network and Distributed Systems Security PDF
alextrek01
No ratings yet
Ingres Database Administrator Guide PDF
Document574 pages
Ingres Database Administrator Guide PDF
plima79
100% (1)
Interactive Brokers in Python With Backtrader by Daniel Rodrig-Job 6
Document10 pages
Interactive Brokers in Python With Backtrader by Daniel Rodrig-Job 6
Alex
No ratings yet
Swatchdog Installation Steps On wrtlx1
Document3 pages
Swatchdog Installation Steps On wrtlx1
Chand Basha
No ratings yet
Assdadasx CC C
Document46 pages
Assdadasx CC C
rama
No ratings yet
JavaFx Restaurant Management
Document60 pages
JavaFx Restaurant Management
Prem Lokesh
No ratings yet
TD Blockchain 2017 2018 Corrige
Document8 pages
TD Blockchain 2017 2018 Corrige
zied
No ratings yet
Install Ns 2
Document2 pages
Install Ns 2
Deepa Thilak
No ratings yet
ZeroMq Installing V1.4
Document7 pages
ZeroMq Installing V1.4
Pretorivm
No ratings yet
Java Beans Explained in Detail
Document13 pages
Java Beans Explained in Detail
Balaji_SAP
No ratings yet
Python Built in Functions Tutorial
Document26 pages
Python Built in Functions Tutorial
doroksha
No ratings yet
Scikit Learn User Guide 0.12
Document1,049 pages
Scikit Learn User Guide 0.12
d993343
100% (1)
Quiz 1
Document3 pages
Quiz 1
Manik Lowe
No ratings yet
Chapter 3 - Solving Problems by Searching Concise
Document67 pages
Chapter 3 - Solving Problems by Searching Concise
Sami
No ratings yet
The Everyday Life of An Algorithm: Daniel Neyland
Document154 pages
The Everyday Life of An Algorithm: Daniel Neyland
Jefry 0816
No ratings yet
FIX Protocol
Document3 pages
FIX Protocol
ksenthil77
No ratings yet
History of Selenium: Practical 1
Document59 pages
History of Selenium: Practical 1
sankalp
No ratings yet
Nis Linux HPC
Document268 pages
Nis Linux HPC
yeldasbabu
No ratings yet
Green Networks: Introduction To Mininet
Document6 pages
Green Networks: Introduction To Mininet
nshivegowda
No ratings yet
Delphi™ 5, Developer's Guide For Windows 98, Windows 95, & Windows NT
Document1,020 pages
Delphi™ 5, Developer's Guide For Windows 98, Windows 95, & Windows NT
Yohanes Teguh
No ratings yet
Flux Tutorial MS2D
Document42 pages
Flux Tutorial MS2D
Alexandre Bourrieau
No ratings yet
Openmp
Document21 pages
Openmp
Mark Veltzer
No ratings yet
Infromation System1
Document47 pages
Infromation System1
gopal_ss4923
No ratings yet
08 Robot Sensor Motor
Document29 pages
08 Robot Sensor Motor
aDun iDei
No ratings yet
Introduction To Hadoop and Mapreduce - VM Setup
Document4 pages
Introduction To Hadoop and Mapreduce - VM Setup
David Llanes
No ratings yet
Chapter 5 (Array and Strings)
Document36 pages
Chapter 5 (Array and Strings)
Sølø Ëd
No ratings yet
Installing and Tasting OpenDaylight Beryllium
Document6 pages
Installing and Tasting OpenDaylight Beryllium
roshan.ranatunge5703
No ratings yet
Factor Participating and Impacting E-Markets Pioneers Behavioral Tracking
Document7 pages
Factor Participating and Impacting E-Markets Pioneers Behavioral Tracking
index Pub
No ratings yet
(OLD) WP - Protecting Against Scanners and Crackers
Document10 pages
(OLD) WP - Protecting Against Scanners and Crackers
ichilov
No ratings yet
SSH Password Guessing: Linux Compromise and Forensics
Document6 pages
SSH Password Guessing: Linux Compromise and Forensics
karthik.forums1246
No ratings yet
Ccna Interview Ques-2
Document11 pages
Ccna Interview Ques-2
Sarath Chandra Guptha
No ratings yet
Introduction To Programming and Algorithms Cat 2
Document10 pages
Introduction To Programming and Algorithms Cat 2
cyrus
100% (1)
Omnet
Document247 pages
Omnet
Heru Sukoco
No ratings yet
Handling JavaFX Events
Document26 pages
Handling JavaFX Events
Raghu Gowda
No ratings yet
TP ACL Configuration
Document2 pages
TP ACL Configuration
jawheramr
No ratings yet
Perry Wolf
Document16 pages
Perry Wolf
BustamanteJose
No ratings yet
Solidity Cheatsheet Zero To Mastery V1.02
Document16 pages
Solidity Cheatsheet Zero To Mastery V1.02
Bryan De Guzman
No ratings yet
07 Kamil Sarac Secure Coding C CPlusPlus
Document35 pages
07 Kamil Sarac Secure Coding C CPlusPlus
leandroparker
No ratings yet
The Secure Zone Routing Protocol (SZRP) 1
Document24 pages
The Secure Zone Routing Protocol (SZRP) 1
Kamalakar Reddy
No ratings yet
Coin Change Problem - Greedy Algorithm
Document9 pages
Coin Change Problem - Greedy Algorithm
Gaurav Sharma
No ratings yet
Attacking NET Serialization
Document53 pages
Attacking NET Serialization
Adrian Carpio Belen
No ratings yet
Node - Js + MongoDB User Authentication & Authorization With JWT - BezKoder
Document34 pages
Node - Js + MongoDB User Authentication & Authorization With JWT - BezKoder
Capitan Torpedo
No ratings yet
Django Admin
Document15 pages
Django Admin
amiesheibani
No ratings yet
TP: Containers Docker
Document8 pages
TP: Containers Docker
DRISS AIT OMAR
No ratings yet
HTML Canvas Deep Dive
Document49 pages
HTML Canvas Deep Dive
Hemanth Kumar
No ratings yet
Swap Files Anti-Forensics On Linux
Document7 pages
Swap Files Anti-Forensics On Linux
omar4821
No ratings yet
Ieee 802.11ac Wlan Simulation in Matlab
Document6 pages
Ieee 802.11ac Wlan Simulation in Matlab
YAAKOV SOLOMON
No ratings yet
CS236 Introduction To PyTorch
Document33 pages
CS236 Introduction To PyTorch
Gobi
100% (1)
PHP Cheat Sheet: by Via
Document2 pages
PHP Cheat Sheet: by Via
randrianarivo
No ratings yet
Chapter 8: Protecting The Network
Document26 pages
Chapter 8: Protecting The Network
Nikita Laptev
No ratings yet
Rip OPNET IT GURU
Document3 pages
Rip OPNET IT GURU
Sheraz
80% (5)
Programming Syntax Cheat Sheet V 2.2
Document5 pages
Programming Syntax Cheat Sheet V 2.2
Brandon
No ratings yet
Breach and Attack Simulation Standard Requirements
From Everand
Breach and Attack Simulation Standard Requirements
Gerardus Blokdyk
No ratings yet
Fidessa A Complete Guide
From Everand
Fidessa A Complete Guide
Gerardus Blokdyk
No ratings yet
Data pre-processing The Ultimate Step-By-Step Guide
From Everand
Data pre-processing The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Moodle 1.9 Extension Development
From Everand
Moodle 1.9 Extension Development
Moore
No ratings yet
Linux Command
Document25 pages
Linux Command
Vikash Agrawal
No ratings yet
Oops With JAVA PDF
Document335 pages
Oops With JAVA PDF
M. Nishom, S.Kom
No ratings yet
Cad Assignment 1
Document14 pages
Cad Assignment 1
Gursewak Singh
No ratings yet
Global Warming (New)
Document17 pages
Global Warming (New)
AMIN BUHARI ABDUL KHADER
100% (1)
1 5 Properties of Real Numbers
Document14 pages
1 5 Properties of Real Numbers
api-233527181
100% (1)
Developing Leaders at Southwest Airlines
Document6 pages
Developing Leaders at Southwest Airlines
api-356490685
No ratings yet
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek οἰκονομία (oikonomia, "management of a household, administration") from οἶκος (oikos, "house") + νόμος (nomos, "custom" or "law"), hence "rules of the house(hold)".[1] Political economy was the earlier name for the subject, but economists in the late 19th century suggested "economics" as a shorter term for "economic science" that also avoided a narrow political-interest connotation and as similar in form to "mathematics", "ethics", and so forth.[2] A focus of the subject is how economic agents behave or interact and how economies work. Consistent with this, a primary textbook distinction is between microeconomics and macroeconomics. Microeconomics examines the behavior of basic elements in the economy, including individual agents (such as households and firms or as buyers and sellers) and markets, and their interactions. Mac
Document3 pages
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek οἰκονομία (oikonomia, "management of a household, administration") from οἶκος (oikos, "house") + νόμος (nomos, "custom" or "law"), hence "rules of the house(hold)".[1] Political economy was the earlier name for the subject, but economists in the late 19th century suggested "economics" as a shorter term for "economic science" that also avoided a narrow political-interest connotation and as similar in form to "mathematics", "ethics", and so forth.[2] A focus of the subject is how economic agents behave or interact and how economies work. Consistent with this, a primary textbook distinction is between microeconomics and macroeconomics. Microeconomics examines the behavior of basic elements in the economy, including individual agents (such as households and firms or as buyers and sellers) and markets, and their interactions. Mac
Susana Stewart
No ratings yet
Basic Simulation Lab
Document69 pages
Basic Simulation Lab
kamalahasanm
No ratings yet
Programs Code
Document5 pages
Programs Code
subramanyam62
No ratings yet
Fea Services - Equipment Analysis
Document5 pages
Fea Services - Equipment Analysis
Miguel A. Garcia
No ratings yet
Action Research Designs
Document2 pages
Action Research Designs
CeeCee Siregar
100% (1)
Pneumatics and EPneumatic Results and Discussion
Document6 pages
Pneumatics and EPneumatic Results and Discussion
Syafiq Sulaiman
No ratings yet
AM-7-Delegation: by Ms. Stuti Jain Symbiosis Law School, Noida
Document6 pages
AM-7-Delegation: by Ms. Stuti Jain Symbiosis Law School, Noida
Ayush Tiwari
No ratings yet
Vice President Distribution Ecommerce Fulfillment in North East PA Resume Kent Rauscher
Document2 pages
Vice President Distribution Ecommerce Fulfillment in North East PA Resume Kent Rauscher
KentRauscher
No ratings yet
A Social History of The Avars Historical-1
Document19 pages
A Social History of The Avars Historical-1
meteayd
No ratings yet
Smart 3 DCurriculum Path Training Guidelines V2016
Document26 pages
Smart 3 DCurriculum Path Training Guidelines V2016
Rafandanu Danisworo
No ratings yet
Logical Correlation Analysis On Victim and Aggressor Pairs With The Same Sense
Document3 pages
Logical Correlation Analysis On Victim and Aggressor Pairs With The Same Sense
Sumanth Varma
No ratings yet
Brake & Friction Test Systems - en
Document13 pages
Brake & Friction Test Systems - en
Divya Shah
No ratings yet
2 Naïve and Scientific Realism
Document5 pages
2 Naïve and Scientific Realism
Valentin Matei
No ratings yet
S800 I O Modules and Termination Units PDF
Document668 pages
S800 I O Modules and Termination Units PDF
Toni Baenk
100% (1)
Mann Kendall Statistic and COV
Document11 pages
Mann Kendall Statistic and COV
Mayank
No ratings yet
Uint8 - T Uint8 - T Uint8 - T Uint8 - T Uint8 - T
Document3 pages
Uint8 - T Uint8 - T Uint8 - T Uint8 - T Uint8 - T
Bernoulli
No ratings yet
From Keerthi
Document3 pages
From Keerthi
tulsidaran
No ratings yet
Keneuoe Mohlakoana
Document137 pages
Keneuoe Mohlakoana
Durga Bhavani
No ratings yet
Music Theory 101 Lesson Plan
Document2 pages
Music Theory 101 Lesson Plan
Joe
No ratings yet
Vol29 Index-List of Contributors
Document982 pages
Vol29 Index-List of Contributors
Anastasia Saklakova
No ratings yet
Registered Electrical Engineer Licensure Examination
Document35 pages
Registered Electrical Engineer Licensure Examination
Aileen Bobadilla
No ratings yet
CMDB2.0.1 DevelopersReferenceGuide
Document386 pages
CMDB2.0.1 DevelopersReferenceGuide
markiitot
No ratings yet
DWM 700
Document16 pages
DWM 700
Tetelo Vincent
No ratings yet
Nat Reviewer Reading and Writing
Document7 pages
Nat Reviewer Reading and Writing
Dave Jr Castor
100% (1)

Tesseract Training - For Khmer Language - For Posting

Uploaded by

Kruy Vanna

100% found this document useful (3 votes)

22K views8 pages

This is a step by step tutorial on how to train Tesseract OCR. Here I train Khmer Language as an example.

Original Title

Tesseract Training_for Khmer Language_For Posting

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

This is a step by step tutorial on how to train Tesseract OCR. Here I train Khmer Language as an example.

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

100% found this document useful (3 votes)

22K views8 pages

Tesseract Training - For Khmer Language - For Posting

Uploaded by

Kruy Vanna

This is a step by step tutorial on how to train Tesseract OCR. Here I train Khmer Language as an example.

Copyright:

Attribution Non-Commercial (BY-NC)

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 8

Search inside document

Download Tesseract from http://code.google.

com/p/tesseract-ocr/downloads/list
Here I choose the compiled one. Tesseract-2.01.ext.tar.gz (It is better to use new version. But since I do not
have compiler at hand now. I’ll just use the compiled one)
Extract it to any location.

Download the english language data.

Extract it and put in the tessdata of your tesseract folder

by Kruy Vanna

Download tesseract source folder. What I need are files in folders configs and tessconfigs of tessdata.
(tesseract.exe u downloaded does not have these)

by Kruy Vanna

Extract it to somewhere and copy the tessdata to our previous tesseract folder.

Now we can start training:

I train with this image. They say should train enough data. So every characters should appear many times. ( don
know if m right).

May be each same character should appear many time but with different font?

by Kruy Vanna
Make box file. Go to command line and set the current directory to your tesseract folder

tesseract fontfile.tif fontfile batch.nochop makebox

Got the file: fontfile.txt

Renamed it to : fontfile.box so that I can open it in Tessboxer
(http://sites.google.com/site/spilkaondrej/)

Here I input the character in the “Letter” textbox and the UTF8 code is automatically filled.
Making feature file ->

tesseract fontfile.tif junk nobatch box.train

got this log

read_variables_file:variable not found:
textord_no_rejectsTesseract Open Source
OCR Engine

Image has 24 bits per pixel and size

(746,387)

Resolution=96

APPLY_BOXES:

Boxes read from boxfile: 19

Initially labelled blobs: 17 in 3 rows

Box failures detected: 2

Duped blobs for rebalance: 2

"ច" has fewest samples: 5

Total
unlabelled words: 1

Final
labelled words: 19

Generating training data

TRAINING ... Font name = UnknownFont.

Generated training data for 19 blobs

Clustering
( You should change the current directory to “training” to use the command)

mftraining fontfile.tr

Now I got the files I should have.

by Kruy Vanna
• Inttemp
This is the binary file -> human eye can’t understand.

• Pffmtable

ខ 104
I don’t know what the number mean.

ង 93

ច 85

• I got this file too “Microfeat” but they say it’s not used

Another command:

cntraining fontfile.fr

Got this file: normproto

Compute the Character Set

unicharset_extractor
fontfile.box
Got this file: unicharset

Dictionary Data

Created “frequent_words_list” file. They said I must put at least one word so I just put “ ខងច” in it using notepad.
Generate the frequent dictionary file using command:
wordlist2dawg
frequent_words_list freq-dawg

Got the file: freq-dawg

Created “words_list” file with the content “ ងចខ”

Generate the word list dictionary file using command:
wordlist2dawg words_list word-
dawg

Got the file: word-dawg

Created “user-words” file. They say it’s usually empty -> I keep them empty
by Kruy Vanna
The last file
This file “DangAmbigs” is manually generated. This file
file’s purpose is to reduce the abiguity. Ex. ““m” can easily
confused with “rn” (r+n)

Khmer character may not have this kind of ambiguity. (need to confirm). So I make it empty file.

Putting it all together

Now I have all the files renamed to have prefix “khm.” (khm is the ISO_639-2_codes of Cambodia lanuage Khmer):
Khmer)

All of these files should be put in “tessdata”” folder.

khm.DangAmbigs
khm.freq-dawg
khm.inttemp
khm.normproto
khm.pffmtable
khm.unicharset
khm.user-words
khm.word-dawg

Now time to run the test!!!

I have this image khmer.tif

I run with command:

Tesseract khmer.tif output –l

khm

I got the output.txt with the content:

ខងច

Cheers!!!

HBase High Performance Cookbook
From Everand
HBase High Performance Cookbook
Ruchir Choudhry
No ratings yet
JAVA Quick Reference PDF
Document3 pages
JAVA Quick Reference PDF
Mohamed Nazim
No ratings yet
Pfsense Basic Configuration
Document19 pages
Pfsense Basic Configuration
aami6
No ratings yet
CA Unified Infrastructure Management Probes - ENU - 20160311
Document4,544 pages
CA Unified Infrastructure Management Probes - ENU - 20160311
Bimo Panji Satrio
100% (1)
Advances in Network and Distributed Systems Security PDF
Document218 pages
Advances in Network and Distributed Systems Security PDF
alextrek01
No ratings yet
Ingres Database Administrator Guide PDF
Document574 pages
Ingres Database Administrator Guide PDF
plima79
100% (1)
Interactive Brokers in Python With Backtrader by Daniel Rodrig-Job 6
Document10 pages
Interactive Brokers in Python With Backtrader by Daniel Rodrig-Job 6
Alex
No ratings yet
Swatchdog Installation Steps On wrtlx1
Document3 pages
Swatchdog Installation Steps On wrtlx1
Chand Basha
No ratings yet
Assdadasx CC C
Document46 pages
Assdadasx CC C
rama
No ratings yet
JavaFx Restaurant Management
Document60 pages
JavaFx Restaurant Management
Prem Lokesh
No ratings yet
TD Blockchain 2017 2018 Corrige
Document8 pages
TD Blockchain 2017 2018 Corrige
zied
No ratings yet
Install Ns 2
Document2 pages
Install Ns 2
Deepa Thilak
No ratings yet
ZeroMq Installing V1.4
Document7 pages
ZeroMq Installing V1.4
Pretorivm
No ratings yet
Java Beans Explained in Detail
Document13 pages
Java Beans Explained in Detail
Balaji_SAP
No ratings yet
Python Built in Functions Tutorial
Document26 pages
Python Built in Functions Tutorial
doroksha
No ratings yet
Scikit Learn User Guide 0.12
Document1,049 pages
Scikit Learn User Guide 0.12
d993343
100% (1)
Quiz 1
Document3 pages
Quiz 1
Manik Lowe
No ratings yet
Chapter 3 - Solving Problems by Searching Concise
Document67 pages
Chapter 3 - Solving Problems by Searching Concise
Sami
No ratings yet
The Everyday Life of An Algorithm: Daniel Neyland
Document154 pages
The Everyday Life of An Algorithm: Daniel Neyland
Jefry 0816
No ratings yet
FIX Protocol
Document3 pages
FIX Protocol
ksenthil77
No ratings yet
History of Selenium: Practical 1
Document59 pages
History of Selenium: Practical 1
sankalp
No ratings yet
Nis Linux HPC
Document268 pages
Nis Linux HPC
yeldasbabu
No ratings yet
Green Networks: Introduction To Mininet
Document6 pages
Green Networks: Introduction To Mininet
nshivegowda
No ratings yet
Delphi™ 5, Developer's Guide For Windows 98, Windows 95, & Windows NT
Document1,020 pages
Delphi™ 5, Developer's Guide For Windows 98, Windows 95, & Windows NT
Yohanes Teguh
No ratings yet
Flux Tutorial MS2D
Document42 pages
Flux Tutorial MS2D
Alexandre Bourrieau
No ratings yet
Openmp
Document21 pages
Openmp
Mark Veltzer
No ratings yet
Infromation System1
Document47 pages
Infromation System1
gopal_ss4923
No ratings yet
08 Robot Sensor Motor
Document29 pages
08 Robot Sensor Motor
aDun iDei
No ratings yet
Introduction To Hadoop and Mapreduce - VM Setup
Document4 pages
Introduction To Hadoop and Mapreduce - VM Setup
David Llanes
No ratings yet
Chapter 5 (Array and Strings)
Document36 pages
Chapter 5 (Array and Strings)
Sølø Ëd
No ratings yet
Installing and Tasting OpenDaylight Beryllium
Document6 pages
Installing and Tasting OpenDaylight Beryllium
roshan.ranatunge5703
No ratings yet
Factor Participating and Impacting E-Markets Pioneers Behavioral Tracking
Document7 pages
Factor Participating and Impacting E-Markets Pioneers Behavioral Tracking
index Pub
No ratings yet
(OLD) WP - Protecting Against Scanners and Crackers
Document10 pages
(OLD) WP - Protecting Against Scanners and Crackers
ichilov
No ratings yet
SSH Password Guessing: Linux Compromise and Forensics
Document6 pages
SSH Password Guessing: Linux Compromise and Forensics
karthik.forums1246
No ratings yet
Ccna Interview Ques-2
Document11 pages
Ccna Interview Ques-2
Sarath Chandra Guptha
No ratings yet
Introduction To Programming and Algorithms Cat 2
Document10 pages
Introduction To Programming and Algorithms Cat 2
cyrus
100% (1)
Omnet
Document247 pages
Omnet
Heru Sukoco
No ratings yet
Handling JavaFX Events
Document26 pages
Handling JavaFX Events
Raghu Gowda
No ratings yet
TP ACL Configuration
Document2 pages
TP ACL Configuration
jawheramr
No ratings yet
Perry Wolf
Document16 pages
Perry Wolf
BustamanteJose
No ratings yet
Solidity Cheatsheet Zero To Mastery V1.02
Document16 pages
Solidity Cheatsheet Zero To Mastery V1.02
Bryan De Guzman
No ratings yet
07 Kamil Sarac Secure Coding C CPlusPlus
Document35 pages
07 Kamil Sarac Secure Coding C CPlusPlus
leandroparker
No ratings yet
The Secure Zone Routing Protocol (SZRP) 1
Document24 pages
The Secure Zone Routing Protocol (SZRP) 1
Kamalakar Reddy
No ratings yet
Coin Change Problem - Greedy Algorithm
Document9 pages
Coin Change Problem - Greedy Algorithm
Gaurav Sharma
No ratings yet
Attacking NET Serialization
Document53 pages
Attacking NET Serialization
Adrian Carpio Belen
No ratings yet
Node - Js + MongoDB User Authentication & Authorization With JWT - BezKoder
Document34 pages
Node - Js + MongoDB User Authentication & Authorization With JWT - BezKoder
Capitan Torpedo
No ratings yet
Django Admin
Document15 pages
Django Admin
amiesheibani
No ratings yet
TP: Containers Docker
Document8 pages
TP: Containers Docker
DRISS AIT OMAR
No ratings yet
HTML Canvas Deep Dive
Document49 pages
HTML Canvas Deep Dive
Hemanth Kumar
No ratings yet
Swap Files Anti-Forensics On Linux
Document7 pages
Swap Files Anti-Forensics On Linux
omar4821
No ratings yet
Ieee 802.11ac Wlan Simulation in Matlab
Document6 pages
Ieee 802.11ac Wlan Simulation in Matlab
YAAKOV SOLOMON
No ratings yet
CS236 Introduction To PyTorch
Document33 pages
CS236 Introduction To PyTorch
Gobi
100% (1)
PHP Cheat Sheet: by Via
Document2 pages
PHP Cheat Sheet: by Via
randrianarivo
No ratings yet
Chapter 8: Protecting The Network
Document26 pages
Chapter 8: Protecting The Network
Nikita Laptev
No ratings yet
Rip OPNET IT GURU
Document3 pages
Rip OPNET IT GURU
Sheraz
80% (5)
Programming Syntax Cheat Sheet V 2.2
Document5 pages
Programming Syntax Cheat Sheet V 2.2
Brandon
No ratings yet
Breach and Attack Simulation Standard Requirements
From Everand
Breach and Attack Simulation Standard Requirements
Gerardus Blokdyk
No ratings yet
Fidessa A Complete Guide
From Everand
Fidessa A Complete Guide
Gerardus Blokdyk
No ratings yet
Data pre-processing The Ultimate Step-By-Step Guide
From Everand
Data pre-processing The Ultimate Step-By-Step Guide
Gerardus Blokdyk
No ratings yet
Moodle 1.9 Extension Development
From Everand
Moodle 1.9 Extension Development
Moore
No ratings yet
Linux Command
Document25 pages
Linux Command
Vikash Agrawal
No ratings yet
Oops With JAVA PDF
Document335 pages
Oops With JAVA PDF
M. Nishom, S.Kom
No ratings yet
Cad Assignment 1
Document14 pages
Cad Assignment 1
Gursewak Singh
No ratings yet
Global Warming (New)
Document17 pages
Global Warming (New)
AMIN BUHARI ABDUL KHADER
100% (1)
1 5 Properties of Real Numbers
Document14 pages
1 5 Properties of Real Numbers
api-233527181
100% (1)
Developing Leaders at Southwest Airlines
Document6 pages
Developing Leaders at Southwest Airlines
api-356490685
No ratings yet
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek οἰκονομία (oikonomia, "management of a household, administration") from οἶκος (oikos, "house") + νόμος (nomos, "custom" or "law"), hence "rules of the house(hold)".[1] Political economy was the earlier name for the subject, but economists in the late 19th century suggested "economics" as a shorter term for "economic science" that also avoided a narrow political-interest connotation and as similar in form to "mathematics", "ethics", and so forth.[2] A focus of the subject is how economic agents behave or interact and how economies work. Consistent with this, a primary textbook distinction is between microeconomics and macroeconomics. Microeconomics examines the behavior of basic elements in the economy, including individual agents (such as households and firms or as buyers and sellers) and markets, and their interactions. Mac
Document3 pages
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek οἰκονομία (oikonomia, "management of a household, administration") from οἶκος (oikos, "house") + νόμος (nomos, "custom" or "law"), hence "rules of the house(hold)".[1] Political economy was the earlier name for the subject, but economists in the late 19th century suggested "economics" as a shorter term for "economic science" that also avoided a narrow political-interest connotation and as similar in form to "mathematics", "ethics", and so forth.[2] A focus of the subject is how economic agents behave or interact and how economies work. Consistent with this, a primary textbook distinction is between microeconomics and macroeconomics. Microeconomics examines the behavior of basic elements in the economy, including individual agents (such as households and firms or as buyers and sellers) and markets, and their interactions. Mac
Susana Stewart
No ratings yet
Basic Simulation Lab
Document69 pages
Basic Simulation Lab
kamalahasanm
No ratings yet
Programs Code
Document5 pages
Programs Code
subramanyam62
No ratings yet
Fea Services - Equipment Analysis
Document5 pages
Fea Services - Equipment Analysis
Miguel A. Garcia
No ratings yet
Action Research Designs
Document2 pages
Action Research Designs
CeeCee Siregar
100% (1)
Pneumatics and EPneumatic Results and Discussion
Document6 pages
Pneumatics and EPneumatic Results and Discussion
Syafiq Sulaiman
No ratings yet
AM-7-Delegation: by Ms. Stuti Jain Symbiosis Law School, Noida
Document6 pages
AM-7-Delegation: by Ms. Stuti Jain Symbiosis Law School, Noida
Ayush Tiwari
No ratings yet
Vice President Distribution Ecommerce Fulfillment in North East PA Resume Kent Rauscher
Document2 pages
Vice President Distribution Ecommerce Fulfillment in North East PA Resume Kent Rauscher
KentRauscher
No ratings yet
A Social History of The Avars Historical-1
Document19 pages
A Social History of The Avars Historical-1
meteayd
No ratings yet
Smart 3 DCurriculum Path Training Guidelines V2016
Document26 pages
Smart 3 DCurriculum Path Training Guidelines V2016
Rafandanu Danisworo
No ratings yet
Logical Correlation Analysis On Victim and Aggressor Pairs With The Same Sense
Document3 pages
Logical Correlation Analysis On Victim and Aggressor Pairs With The Same Sense
Sumanth Varma
No ratings yet
Brake & Friction Test Systems - en
Document13 pages
Brake & Friction Test Systems - en
Divya Shah
No ratings yet
2 Naïve and Scientific Realism
Document5 pages
2 Naïve and Scientific Realism
Valentin Matei
No ratings yet
S800 I O Modules and Termination Units PDF
Document668 pages
S800 I O Modules and Termination Units PDF
Toni Baenk
100% (1)
Mann Kendall Statistic and COV
Document11 pages
Mann Kendall Statistic and COV
Mayank
No ratings yet
Uint8 - T Uint8 - T Uint8 - T Uint8 - T Uint8 - T
Document3 pages
Uint8 - T Uint8 - T Uint8 - T Uint8 - T Uint8 - T
Bernoulli
No ratings yet
From Keerthi
Document3 pages
From Keerthi
tulsidaran
No ratings yet
Keneuoe Mohlakoana
Document137 pages
Keneuoe Mohlakoana
Durga Bhavani
No ratings yet
Music Theory 101 Lesson Plan
Document2 pages
Music Theory 101 Lesson Plan
Joe
No ratings yet
Vol29 Index-List of Contributors
Document982 pages
Vol29 Index-List of Contributors
Anastasia Saklakova
No ratings yet
Registered Electrical Engineer Licensure Examination
Document35 pages
Registered Electrical Engineer Licensure Examination
Aileen Bobadilla
No ratings yet
CMDB2.0.1 DevelopersReferenceGuide
Document386 pages
CMDB2.0.1 DevelopersReferenceGuide
markiitot
No ratings yet
DWM 700
Document16 pages
DWM 700
Tetelo Vincent
No ratings yet
Nat Reviewer Reading and Writing
Document7 pages
Nat Reviewer Reading and Writing
Dave Jr Castor
100% (1)

Tesseract Training - For Khmer Language - For Posting

Uploaded by

Copyright:

Available Formats

You might also like

Tesseract Training - For Khmer Language - For Posting

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tesseract Training - For Khmer Language - For Posting

Uploaded by

Copyright:

Available Formats

Download Tesseract from http://code.google.

Download the english language data.

Now we can start training:

tesseract fontfile.tif fontfile batch.nochop makebox

Got the file: fontfile.txt

tesseract fontfile.tif junk nobatch box.train

got this log

Image has 24 bits per pixel and size

Boxes read from boxfile: 19

Initially labelled blobs: 17 in 3 rows

Box failures detected: 2

Duped blobs for rebalance: 2

"ច" has fewest samples: 5

Generating training data

TRAINING ... Font name = UnknownFont.

Generated training data for 19 blobs

Now I got the files I should have.

Got this file: normproto

Compute the Character Set

Got the file: freq-dawg

Created “words_list” file with the content “ ងចខ”

Got the file: word-dawg

Putting it all together

All of these files should be put in “tessdata”” folder.

Now time to run the test!!!

I have this image khmer.tif

I run with command:

Tesseract khmer.tif output –l

I got the output.txt with the content:

You might also like