Professional Documents
Culture Documents
ABBYY Fine Reader Engine10 Data Sheet
ABBYY Fine Reader Engine10 Data Sheet
ABBYY Fine Reader Engine10 Data Sheet
Overview
What's New
Trial Version
Technical Specifications
Overview
Accuracy and speed, power and simplicity – are you expecting all these qualities from OCR engine, but they seem to be
contradictive?
With new ABBYY FineReader Engine 10 you achieve the outstanding level of OCR quality and usability:
• Choose ABBYY FineReader Engine 10 and receive award-winning OCR SDK providing unrivaled accuracy, high
recognition speed, outstanding functionality and 198 supported languages.
• Enjoy working with comprehensive, easily-integrated API supplied with clear documentation and various code
samples.
• Appreciate unique set of breakthrough technologies including improved ADRTTM, Camera OCRTM, new binarization,
multicore CPU support, PDF MRC.
• Expand your markets with ABBYY OCR engine’s multiple OS support: Windows, Linux, Mac OS and variety of
embedded platforms.
• Trust in ABBYY’s proven partnerships with industry leaders worldwide who have been choosing ABBYY’s technologies
for decades.
ABBYY FineReader Engine 10 – powerful and convenient OCR technology. Just try and appreciate!
The version 10 of ABBYY FineReader Engine for Windows delivers high-quality recognition technologies with revolutionary
enhancements:
• Up to 92% processing speed growth for key European languages.
• Up to 40% accuracy increase for Asian languages – Chinese, Japanese and Korean.
• Optimally balanced profiles with fine-tuned parameters for your particular tasks.
• Absolute world record – 198 recognition languages, including Chinese, Japanese, Korean, Vietnamese, Thai and
Hebrew.
• SDK Developer's Guide (Help), currently recognized for its unbeatable comprehensibility and usefulness, now
becomes even better with its improved appearance and revised content.
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-1-
What's New in ABBYY FineReader Engine 10
Up to 92% faster processing with Enhanced Fast Mode
New enhanced Fast Mode designed to optimize processing speed/accuracy balance for images of good quality. Up to 92%
faster page throughput while maintaining the high level of recognition accuracy*
See more in Extreme Processing Speed
Notice: ABBYY unrivaled multicore support architecture ensures close to linear performance growth with increasing number
of cores for multipage documents. For 2 CPU cores it works almost 2 times faster, for 4 cores – almost 3.8 times!
With new features of ABBYY FineReader Engine 10 you can carefully reconstruct the original layout of document and its
structure for easy content reuse:
• Document structure detection. ABBYY FineReader Engine 10 automatically detects headings in recognized document,
determines their level in document structure, defines their text styles and reconstructs the whole structure as Document Map
of resulting document.
• Table of Contents (TOC) reconstruction. In final document the Table of Contents appears as a set of links to the
headings. After document editing TOC could be updated automatically as a single object to add new headings and revise page
numbers.
• Charts and diagrams detection. Automatic charts and diagrams detection feature was improved in 10th version of ABBYY
OCR SDK. Now it is possible to choose if recognize text on chart or stay it in origin image form.
• Picture and table captions processing. ABBYY FineReader Engine 10 automatically detects picture’s and table’s captions
and exports them to final document as a single frame including the picture and its title.
• Document styles defining. ABBYY FineReader Engine 10 analyzes text font type, size, and its placement and defects the
corresponding font style for every type of text. So for the headings of each level there are special styles, for ordinary text, for
TOC and for picture captions there are also special styles.
• “Glossy magazine” processing model. New ABBYY OCR SDK can reconstruct complicated layouts consisted of many
pictures and text blocks on a page or including very large pictures for the whole page.
For more details please see Unique Features for Document Structure and Layout Reconstruction (ADRT)
• Superior quality-size ratio for PDF files. New PDF export together with improved MRC (Mixed Raster Content)
compression technology allows achieving higher quality and less size of PDF documents. You get superior quality-size ratio for
PDF documents both – with and without MRC.
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-2-
• PDF export profiles. There are more than 40 parameters for PDF export tuning. ABBYY FineReader Engine 10 provides
predefined profiles with optimal values for popular export variants:
• MaxQuality
• Balanced
• MinSize
• MaxSpeed
With predefined PDF export profiles you automatically set optimal values for particular task.
• New features of Camera OCRTM. Camera OCR technology – the set of document photo adjustment features for better
recognition results was improved with new unique features:
• Automatic correction of 3D perspective distortions
• Blurred image correction
• ISO noise reduction
The most of document photos, especially taken by embedded cameras have some usual defects. With new Camera OCR
features of ABBYY FineReader Engine 10 you can overcome them and achieve better recognition results
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-3-
Example #2: Textured background
• Color marks and stamps filtering. If there are some stamps or marks, made by pen, marker on document image they
usually interfere text and decrease OCR results. That is why ABBYY FineReader Engine 10 includes special feature for color
marks and stamps filtering and improving recognition accuracy. An excellent feature for data capture systems allows
preventing data losses from fields covered by stamps and color marks.
With unique abilities of ABBYY FineReader Engine 10 protection system you can choose an optimal licensing scheme to pay
for used features only.
You can easily construct your own protection system based on integrated SDK’s protection and gain from total control of your
application usage. You can measure integrated OCR performance by any convenient units: pages per month or per year,
characters per second, CPUs and workstation numbers and so on. This guarantees the maximal profit from your application.
Improved Developers Guide (Help). Improved Help distinguishes by updated structure and appearance together with new
content including general product description, API specification, usage samples and best practices. You will fast and easily
find all necessary information and will enjoy working with ABBYY FineReader Engine 10.
* Based on internal testing of FineReader Engine 9 (release 1) vs. FineReader Engine 10 (release 1). Tests included standard
business document scans in English, German, French, Spanish and Italian. Your results may vary based on scan quality,
document complexity, system and application type.
** Based on internal testing. Your results may vary based on scan quality, document complexity, system and application
type.
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-4-
Key Features
The ABBYY FineReader Engine’s API delivers a variety of powerful document processing functions and technologies, including:
Awarded ABBYY OCR technologies are worldwide famed for outstanding recognition accuracy in European and Asian
languages.
Recognition accuracy is a key parameter of recognition technology and ABBYY team worked hardly on its improvement since
the first version of FineReader OCR software.
ABBYY FineReader Engine 10 provides unrivaled OCR accuracy for European and Asian (Chinese, Japanese, Korean,
Vietnamese, Thai, Hebrew) languages.
The following chart shows the impressive decrease of recognition errors numbers* for some languages comparing to previous
FineReader Engine 9.0 OCR software. If for European languages there are about 10% less errors, for Asian – up to 40%:
*Number of recognition errors is a basic parameter for OCR software accuracy measurement.
Notice: if accuracy increased from 98% to 99% it seems as a slight and unimportant 1% growth. But in fact this means that
number of errors is halved! You need twice less resources for document verification and the total cost of document conversion
process decreases dramatically
Processing speed as the important recognition quality parameter and ABBYY FineReader Engine 10 provides several ways of
its enhancement:
• Enhanced Fast mode – 92% faster with high accuracy – New!
High recognition speed obviously is a significant OCR quality parameter. ABBYY developer team is steady working on
speed optimization. In 10th version of FineReader Engine we have achieved absolutely amazing results – about twice
speed growth for the most of European and Asian languages*:
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-5-
For faster recognition switch on there is a special parameter Fast Recognition mode. It includes adjusted parameters for
2-3 times speed increasing comparing to default – Normal Recognition mode. In this case the accuracy decreases for
about 0.5-1% only.
Learn more about the ways of performance increasing on Optimized performance on multicore processors and Easy start
and optimal performance with new recognition profiles pages.
* comparing to ABBYY FineReader Engine 9.0 released in October 2008. Based on internal ABBYY testing.
Notice: This graphic does not take into account document export step because it could vary from scenario to scenario
and can’t be paralleled. Speed increase rate can also be different depending on a document complexity. For documents
with complex layout it is higher, for simple – lower. The more time spent for analysis and recognition – the higher benefit
from multi-processing. The graphic also shows that the more pages are in a document the more effective load balancing
and performance rate.
Numbers quoted are based on internal ABBYY testing.
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-6-
Recognition Profiles for Easy Start and Optimal Performance
ABBYY FineReader Engine 10 new API offers a set of pre-defined profiles for the most popular recognition tasks such as
“document conversion for archiving”, “document conversion for content reuse”, “barcode recognition” and more. The profiles
include optimal sets of processing parameters to provide the easy start and the best OCR quality without the need for manual
system tuning or long learning curve. The manual tuning is also available for specific custom solutions.
Every task is unique and requires some specific adjustments of technology. For example:
o Book scanning and converting into searchable PDF. It requires high processing speed, good visual quality and
small size of PDF. Recognition accuracy is not a critical parameter, some satisfactory level is enough.
o Document conversion into editable format (.doc) for content reuse and new documents creation. In this case
recognition and document reconstruction accuracy are the most required issues. Each error in characters or
structure and layout – is an additional job for verification specialists because errors are not allowed in final
document.
As you see from examples – each particular task has its specific requirements and should be processed by especially adjusted
SDK. ABBYY FineReader Engine 10 recognition API provides fine-tuned profiles containing optimal values for popular
recognition scenarios:
DocumentArchiving_Accuracy
DocumentArchiving_Speed
Document conversion for archiving
BookArchiving_Accuracy
BookArchiving_Speed
DocumentConversion_Accuracy
Document conversion for content reuse
DocumentConversion_Speed
Ground text extraction for fields detection TextExtraction_Accuracy
and documents classification TextExtraction_Speed
Fields level recognition FieldLevelRecognition
Barcodes recognition BarcodeRecognition
Thus during OCR SDK integration that is enough to choose a proper profile for particular task. Optimal parameter values will
be set automatically. If it is necessary you can set any parameter though recognition API for more delicate tuning.
ABBYY recognition engine is a stable basis for truly international, localizable and full-featured applications.
• OCR technology — printed text recognition is available for 198 languages, including:
o European languages (Latin, Cyrillic, Armenian, Greek alphabets)
o Chinese (Simplified and Traditional), Japanese, and Korean (CJK)
o Thai, Vietnamese and Hebrew
o Arabic — technical preview version
o FineReader XIX — an OCR module designed specifically for digitizing and archiving old documents, books
and newspapers published in the XVII-XX centuries, many of which are rare and unique. Stored in the
historical archives of libraries and government organizations, they are national heritage that must be
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-7-
preserved. FineReader XIX provides a unique capability to recognize texts published in the period from 1600
till 1937 in English, French, German, Italian and Spanish. It supports recognition of old fonts such as
Fraktur, Schwabacher and the majority of Gothic fonts.
• 47 languages have dictionary/morphology support that is significantly improves OCR accuracy.
• Multilingual documents recognition feature provides recognition of several languages e.g. German and Chinese;
English, Russian and Korean at the same document.
• Dot-matrix documents recognition — ABBYY FineReader Engine recognizes printed dot matrix texts of many types. It
has been trained using several thousand samples produced by a variety of printers including dot matrix, daisy wheel,
chain and band printers, as well as using draft and Near Letter Quality (NLQ) printing modes.
• Typewritten documents recognition.
• Recognition of OCR-A, OCR-B, MICR (E13B) and CMC7 fonts.
The ABBYY’s OMR technology recognizes simple checkmarks, grouped checkmarks, model checkmarks and checkmarks with
“corrections” made by hand in different variations:
• Char Box Series
• Comb In Frame
• Grey Boxes
• Partitioned Frame
• Simple Comb
• Text In Frame
• Underlined Text
OMR delivers accuracy rate of 99.995 %
• 1D and 2D barcode types. ABBYY OCR SDK supports recognition of popular types of 1D and 2D barcodes.
1D Barcodes
- IATA 2 of 5
- Codabar - Code 93 - Matrix 2 of 5 - UCC-128
- Industrial 2 of 5
- Code 128 - EAN 8 - Patch - UPC-A
- Interleaved 2 of
- Code 39 - EAN 13 - PostNet - UPC-E
5
With checksum With supplemental
- Code 39 - EAN 8
- Interleaved 2 of 5 - EAN 13
- Codabar - UPC-E
2D Barcodes
- PDF417 - DataMatrix
- Aztec - QR Code
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-8-
• Fast barcode extraction. This feature enables automated detection and recognition of barcodes at any angle on a
document. It works both for 1D and 2D barcodes
Recognition modes
With the Engine's pre-defined processing modes, developers have the ability to quickly set up and tune the processing speed
and accuracy in a way which is the most appropriate for their needs. In addition to the default processing mode, both OCR
and ICR recognition can be performed in normal, fast and balanced recognition modes:
• Normal recognition mode
It is the most accurate mode for achieving the highest quality of recognition. This mode is highly recommended if
you are planning to reuse recognized content and in other tasks when the accuracy is the critically important issue.
• Fast recognition mode
It is designed for high-volume document processing and for the cases when speed is of primary importance. This
mode increases processing speed by 200-250% making the technology ideal for using in content management
(CMS), document management (DMS) and archiving systems.
• Balanced recognition mode
It sets the intermediate values of recognition accuracy and speed between Normal and Fast modes. Generally it
provides higher speed for almost the same accuracy level as Normal mode.
There are two types of recognition which could be separated: full text and field-level recognition. The main difference is that
full text recognition usually includes OCR technology and used for document conversion. Field-level recognition includes OCR,
ICR and other technologies that are used in local area for recognizing and extraction particular data.
The following table shows specifications of these recognition types:
Full text recognition is a basic recognition type for different tasks, like:
• Documents and books conversion for archiving
• Document conversion for content reuse
• Ground text extraction for fields detection and documents classification
See more in Recognition Profiles for Easy Start and Optimal Performance
All of them require the recognition (OCR) of whole text on document (page). Before recognition the document analysis
usually processes for splitting and correct orientation of pages, detection of text blocks, pictures and other objects.
Then after OCR document synthesis rebuilds the structure and layout of document (for content reuse task) or just retrieves
the correct text order for complex documents with several text columns and pictures (for archive scenario). Resulted text is
exported depending on task as pure text or as a document of supported format.
The text could be manually verified for increasing its accuracy, especially for future reuse.
Field-level recognition
ABBYY FineReader Engine 10 delivers complete field-level recognition capabilities to support key business processes such as
forms processing, keyword classification, and keyword indexing. Powerful image processing functions increase its ability to
intelligently detect small zone areas of any quality, with any type of graphic specifics which may affect the recognition
accuracy (i.e. underlined text, after-scanning garbage, spaces in the text, etc.)
Key functionality for field-level or zonal recognition includes multilingual OCR and ICR, OMR, barcode recognition and a range
of specific functions, such as:
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
-9-
• Data extraction from fields with various borders and frames, including combo-box, underlined fields, boxes, and
even fields where the data does not fit within the field border
• Definition of field content by setting alphabets, dictionaries, regular expressions, types of segmentations,
handwriting styles, etc.
• Detection of in-field spacing, accurately recognizing fields where the spaces are allowed. ABBYY FineReader Engine
10 also allows use of dictionaries which contain word combinations with spaces
• Intelligent processing of blocks with intersecting parts and lines, provides recognition of text (words and symbols)
located entirely within the block borders, saving time spent on non-relevant text block recognition
• Text block despeckle, with the ability to specify the size of white or black "garbage"
Field-level recognition is supported by the Engine’s special tools for developers such as Voting API and "On-the-Fly"
Recognition Tuning. For details, please see Advanced Development Tools.
User languages
ABBYY FineReader Engine provides an API for creating and editing recognition languages, creating copies of predefined
recognition languages and adjusting them, and adding new words to user languages.
Below are two examples illustrating how user languages can help you to improve recognition quality:
• In documents filled out by hand, the values in the form fields usually belong to a specific set such as city names,
countries, zip codes, product codes, sums, etc. To improve the quality of ICR recognition, you can use user
languages to describe the information which may be entered in each field.
• If a document contains "structures" such as product codes, telephone numbers, passport numbers etc., recognition
errors may occur. This happens because the program reads such structures letter by letter. To improve the
recognition of product codes and the like, you can create a new recognition language which will help the program to
read specific types of data correctly.
Pattern Training
In the vast majority of cases FineReader Engine can successfully read texts without prior training. However, in such cases as
recognition of decorative or outlined fonts or bulk input of low print quality documents, preliminary pattern training will prove
useful.
The OCR SDK allows you to create and use user patterns or import them from the ABBYY FineReader desktop
application (Professional or Corporate Edition). FineReader Engine is flexible and applicable to build up an application of any
architecture, either it is a client workstation or a server-based solution.
Advanced document recognition technology (ADRTTM) recognizes and retains lots of document elements for correct
document structure and layout reconstruction:
• Reconstruction of document logical structure and formatting
• Table structure and content recognition
• Vertical text recognition
• Recognition of magazine-style pages – New!
ABBYY Advanced Document Recognition Technology (ADRTTM) is an unrivaled set of document analyzing and reconstruction
features.
Document conversion to editable formats (.doc, .rtf) assumes not only the whole text recognition but also document structure
and layout retention. So OCR system has to analyze document content, extract and save into final document such elements
as: headers, footers, page numbers, footnotes, table of contents, and others. Document formatting reconstruction is also
necessary: font styles, text flows, tables and pictures formatting.
ADRT includes features for correct document conversion:
Reconstruction of document logical structure elements and formatting
• Heading hierarchic structure — New!
• Table of contents — New!
• Fonts and font styles — Enhanced
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 10 -
• Captions to images/tables/diagrams — Enhanced
These features are supported by unique Document Structure API that provides access to all listed document elements.
Developers are able to implement highly accurate and comprehensive document conversion application using ABBYY ADRT
functionality.
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 11 -
PDF Conversion
ABBYY FineReader Engine provides developers with special tools for PDF input and document conversion to multiple types of
searchable PDF, including PDF/A and compressed MRC PDF.
PDF Input
• Output in Tagged PDF format – that can be "reflowed" to fit different page or screen widths. Ideal for use with
handheld devices (PDAs) or screen readers typically used by visually impaired users.
• Page size - Ability to set the size for all pages of an output file during PDF conversion.
• Links in PDF files – Re-creates hyperlinks within a PDF file.
• PDF/A export – Conversion to PDF/A format which is recommended as a standard for long-term preservation of
page-oriented documents.
• CJK to PDF export – Enables conversion of documents in Chinese (both simplified and traditional), Japanese and
Korean into PDF format.
• PDF(/A) MRC compression – Compressed PDF (PDF/A) files have a significantly reduced size with the original
quality preserved. MRC PDF compression technology is ideal when color documents are scanned and processed.
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 12 -
PDF Conversion Quality and Speed Tuning
FineReader Engine is a flexible OCR SDK and provides developers with special tools to achieve the optimal PDF
conversion mode appropriate for their particular needs.
With wide spread of embedded photo-cameras taking photos becomes a popular way of digitizing documents. Photo-cameras
of mobile phones and other devices provide suitable resolution and image quality but have some specific distortions, usual for
photos. So they require special image pre-processing for better recognition. Camera OCRTM- Unrivaled Quality of
Document Photoa Processing
ABBYY FineReader Engine 10 contains the following Camera OCRTM features:
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 13 -
• Blurred images correction – New!
If camera is not fixed during the photo taking, image could be slightly blurred. It is not noticed on camera screen,
but causes OCR problems. After some transformation of the source image, the binary image becomes more
readable:
- 14 -
• VB Script, and other scripting languages;
• Borland Delphi 2.0 and above;
• Java
Any other development environment that supports COM and ActiveX objects correctly.
Samples
The ABBYY FineReader Engine includes OCR samples for Visual Basic, Visual Basic .Net, Delphi, raw C++, C++ with the
Native COM Support, C#, and scripting languages.
Visual Components
The ActiveX-based visual components provide easy integration of user interface elements to existing applications. Developers
can give users direct but controlled access to recognition results and functions of OCR SDK library for validation or checking
of documents. The visual components set consists of five components designed according to company’s experience in creating
end-user applications.
• Scan Interface
ABBYY recognition SDK library provides an interface for accessing TWAIN-based scanners and allows users to select
key scanning settings such as resolution, paper settings, etc.
• Document Viewer
A visual overview of documents images and status of recognition processing. Available in a thumbnail view of each
page, or a detailed view in table format.
• Image Viewer
A full image view of a document page – it can be processed with specialized OCR library tools for creating, editing
and changing the settings of recognition blocks. Tools include for example:
o Rotating, cropping and splitting images.
o Drawing recognition areas or selecting block types – text, image, table, barcode.
o Buttons of the toolbar can be displayed or hidden via code, customized buttons are possible.
• Text Editor
Provides option to highlight uncertain characters and basic text formatting tools. Developers can control the text
area display, available buttons and the user actions.
• Text Validator
An easy-to-use wizard-like tool to check recognized characters marked as “uncertain”. Includes an integrated
function to check spelling and provides a “zoom” view of key portions of text to be checked. Developers can control
the behavior of this recognition library component.
Notice: Visual components are available in FineReader Engine 10 Maintenance release 2
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 15 -
hypotheses: "0" (zero), with 60% confidence; capital "O", with 80% confidence; and capital "C", with 10% confidence.
Another example for intercharacter separation: the possible hypotheses for an "m" would be "m", "rn", and "in".
• Raw
• Bitmap (HBITMAP)
• DIB
Additional features for PDF files
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 16 -
skew. It does not require leading edge borders or lines. New FineReader Engine 9.0 provides several methods for de-skewing
images: with pairs of black squares, lines or lines of text.
Splitting dual pages
It is used for scanning books as broadsides – for both left and right pages. The recognition quality is higher if the page is split
into two, with each page corresponding to a single book page. Recognition and layout analysis are then performed separately
for each page, along with the de-skewing if required.
Image despeckling (or image clean-up)
The function is designed to eliminate random noise (speckles). A document image may have a large amount of "dust" on it,
i.e. a large number of excess dots. The dots arise in the case of medium-to-low print quality of documents, and dots located
close to character outlines may have an adverse effect on recognition quality. In these cases, the despeckle technique helps
to improve the recognition quality.
Texture filtering and Adaptive Binarization
Texture filtering technology helps to filter out background "noise" such as color and texture, increasing accuracy for difficult-
to-read documents such as newsprint, color documents, faxes, and copies.
Innovative Adaptive Binarization technology dynamically adjusts threshold of brightness for each image fragment during the
recognition. By applying individual recognition parameters, it produces accurate recognition results for documents with gray
or color-variable contrast background and textures.
Autodetection of page orientation (90, 180, and 270 degrees)
This document imaging feature is very important for bulk input of images, when the direction in which document pages are
scanned is unknown and can be different. The system automatically detects the orientation of each page and corrects it if
needed.
Adjusting text and background color
This function is designed for users working with archiving and document management systems (DMS). When sending a
recognized document for archiving, it is saved as both an image and plain text, with the text archive index containing
coordinates of each character on the image. When searching through the electronic archive, a user receives a document
image as a result. To make the text found on the image clearly visible, the program highlights it and, if necessary, changes
the initial color of the text and background by performing this document imaging function.
ABBYY Camera OCR technology
The technology intelligently identifies images captured with digital cameras and applies special image processing algorithms
to correct distortions typically associated with digital camera images such as out-of-focus text, curved text lines, missing
information about resolution or distortions caused by poor lighting,.
Despeckling images in individual blocks (or zones), with the ability to specify the size of black dots.
Data extraction from fields with various borders and frames, such as combo box, underlined fields, boxes, etc.
Document analysis
The OCR SDK document analysis function set includes features for automatic document conversion with full-page layout
retention, zoning OCR with manually located blocks, special analysis for Invoices and other features
Basic document analysis features
Document Analysis is a set of functions for automatic detection of the following objects on a page:
• Text blocks
• Pictures
• Tables and table cells
• Barcodes
• Separators
Additionally document analysis provides some special features to prepare image for OCR:
• process detection of page orientation – 90, 180, and 270 degrees (see Image Processing)
• split double pages
• process vertical text detection in table cells
• detect and mark the blocks of garbage on page
This preparation is significantly important to specify which fields on page should be recognized and what should be kept in
initial form.
And also there is an ability to designate the field for recognition manually. In this case you have to set field’s coordinates and
type of data inside. It is used in Field-Level Recognition scenario mostly for data capture.
ABBYY FineReader Engine 10 provides 3 automatic and 1 manual type of document analysis:
• General document analysis
• Document analysis for invoices
• Document analysis for full-text indexing
• Manual blocks specification for field-level recognition
General document analysis
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 17 -
This is default document analysis type which searches all objects: text blocks, pictures, tables, barcodes and separators. The
results of this analysis are used for document structure and layout retrieval in content reuse scenario. All pictures and
diagrams are preserved in original form without recognizing text on them.
Developer Licenses
A Developer License grants developers the right to install FineReader Engine 10 development tools for the purpose of creating
their own software with integrated FineReader Engine parts. FineReader Engine 10 Developer License allows to use most of all
functions of FineReader Engine.
Runtime Licenses
A Runtime License (RTL) allows the end user to use the ABBYY FineReader Engine 10 components as part of the application
created by the developer. The price for a RTL greatly depends on performance and functionality of the resulting OCR solution.
There are many scenarios where ABBY FineReader Engine could be used and not all of its functionality is needed in every
case.
RTLs differ in types and cost. For example, there are RTLs, which allow the end user to use ABBYY FineReader Engine 10
jointly on several computers within one LAN. There are two types of RTLs:
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 18 -
• standalone
Standalone license allows installation on the only one station – Server or PC:
In current version of protection subsystem there is a new parameter Lease time (NEW)– the minimal time of using license
for 1 station (in seconds). With this parameter we can simulate both variants of FRE9 network licensing – per seat and
concurrent.
For example we can use these values of Lease time (T):
• T = 1 sec. =>true concurrent license
• T>= 28800 sec.(8 hours)=>per seat license, every license is tied to one PC for 8 or more hours
• 1<T<28800 =>different middle variant
RTLs are protected either by a USB protection key (preferably) or by a software protection key.
Upgrade policy
Developers can upgrade to FineReader Engine 10 if they currently own older version of ABBYY FineReader Engine. Pricing
varies depending on the version of ABBYY FineReader Engine they currently have and number of the licenses required.
For further details, please get in touch with ABBYY Ukraine by e-mail engine@abbyy.ua .
Trial Version
ABBYY offers a trial version for developers interested in testing ABBYY FineReader Engine. The trial version is time-limited
and will run for 60 days, allowing you to process only a certain number of pages. To use a trial version, you need to sign a
Trial Software License Agreement. To receive a copy of the Trial Software License Agreement, please contact ABBYY Ukraine
by e-mail engine@abbyy.ua or fill out the electronic form at www.abbyy.com .
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 19 -
Technical Specifications
Extended list of technical specification is given below:
Developer Environment
• Microsoft Visual Studio.NET (VB.NET, C#);
• Microsoft Visual Basic 5.0, 6.0;
• Microsoft Visual C++ 4.x and above;
• VB Script, and other scripting languages;
• Borland Delphi 2.0 and above;
Any other development environment that supports COM and ActiveX objects correctly.
The ABBYY FineReader Engine includes OCR samples for Visual Basic, Visual Basic .Net, Delphi, raw C++, C++ with the
Native COM Support, C#, and scripting languages.
System Requirements
Workstation Requirements
Server Requirements
Input/Output Formats
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 20 -
BMP: bmp +
4- and 8-bit — RLE compressed Palette
DCX: dcx + +
black and white
2-, 4- and 8-bit palette
24-bit color
PCX: pcx + +
black and white
2-, 4- and 8-bit palette
24-bit color
PNG: png + +
black and white, gray, color
JPEG 2000: jp2, jpc + +
gray — Part 1
color — Part 1
JPEG: jpg, jpeg, jfif + +
gray, color
PDF (Version 1.7 or earlier) pdf + +
TIFF: tif, tiff + +
black and white — uncompressed, CCITT3, CCITT3FAX, CCITT4, Packbits, ZIP, LZW
gray — uncompressed, Packbits, JPEG, ZIP, LZW
24-bit color — uncompressed, JPEG, ZIP, LZW
1-, 4-, 8-bit palette — uncompressed, Packbits, ZIP, LZW
(including multipage TIFF)
GIF: gif +
black and white — LZW-compressed
2-, 3-, 4-, 5-, 6-, 7-, 8-bit palette — LZW-compressed
DjVu: djvu, djv +
black and white, gray, color
JBIG2: jb2 + +
black and white
WDP: wdp +
black and white, gray, color
(WIC or Microsoft .NET Framework 3.0 required)
Recognition Languages
ABBYY FineReader Engine 10 recognizes 198 OCR languages with Latin, Cyrillic, Greek, Armenian characters, East Asian
languages and 113 ICR languages.
See the full list of recognition languages at abbyy.com
Barcode Types
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 21 -
Add-on Modules
FineReader Engine includes several add-on modules available for RTLs, extending its functionality: special document analysis
for Invoices, additional PDF recognition features, CJK OCR (Chinese, Japanese, Korean), Thai, Hebrew OCR and others
Message Languages
Dialogue captions, text, error and other program messages are available in English, German, French, Spanish, Italian, Dutch,
Portuguese, Russian, Estonian, Polish, Czech, Slovak, Hungarian, Bulgarian, Ukrainian, Swedish, Greek, Lithuanian, and
Latvian
The Developer’s Guide in CHM and PDF formats is available in English. It contains a detailed description of the API and
general information about licensing and activation.
The code samples which are provided with FineReader Engine development toolkit will help developers understand how to use
the API of FineReader Engine in typical scenarios. A developer may copy, modify or use them to create his own software
program based on the FineReader Engine API.
The Administrator’s Guide contains information on how to install FineReader Engine in the customer’s LAN and how to
manage licenses by using Network License Manager
ABBYY Software House Ukraine, P.O. Box 23, 02002 Kyiv, Ukraine. Tel: + 380 44 4909999, fax: +380 44 4909461, engine@abbyy.ua
ABBYY FineReader Engine 10 Data Sheet
- 22 -