Brought to you by scanner one www.scannerone.com Patented VRS technology from Kofax® ensures that your scanning is as efficient and easy as possible, while also improving both the quality of the scanned images and the automated capture of information from your paper documents and forms. The result is lower scanning costs, lower data entry costs, and faster access to your information.
Brought to you by scanner one www.scannerone.com Patented VRS technology from Kofax® ensures that your scanning is as efficient and easy as possible, while also improving both the quality of the scanned images and the automated capture of information from your paper documents and forms. The result is lower scanning costs, lower data entry costs, and faster access to your information.
Brought to you by scanner one www.scannerone.com Patented VRS technology from Kofax® ensures that your scanning is as efficient and easy as possible, while also improving both the quality of the scanned images and the automated capture of information from your paper documents and forms. The result is lower scanning costs, lower data entry costs, and faster access to your information.
HP Scanner Software Scanner
Brought to you by scanner one www.scannerone.com DocuWare Business Document Management Software to handle all of your scanned documents, printed documents or even faxed documents

E-mail Us

Semi-Structured Forms Processing - ABBYY FlexiCapture Technology

How FlexiCapture Studio Works

ABBYY FlexiCapture Studio is intended for creation of FlexiLayout, which is a formalized description of logic of data extraction from documents with similar data and different layouts. The steps for creating a FlexiLayout are as follows.

Creating the Project

As a first step, an empty Project should be created. The Project stores all the relevant information for a single FlexiLayout, including the elements, matching results and data. Sample images can be added to the Project at any time and are intended for creation, testing and adjustment of FlexiLayout. The set of images should be varied enough to represent the range of documents to be processes with the created FlexiLayout. Similar documents do not really help improve the quality of the FlexiLayout, but any sufficient variation of the document allows to make the FlexiLayoutThe more reliable. The more variation of the document layout covered by sample images, the better.

Pre-recognition

Pre-recognition is executed by award-winning ABBYY full-text recognition engine and used for detecting all objects that can be found on the image: text, separators, pictures, barcodes. The pre-recognized objects are such a source building material for the FlexiLayout. If an object cannot be found on an image due to some reasons (for example, poor image quality) it cannot be included in FlexiLayout.

All elements that are regularly encountered on the images can serve as starting points for looking for the objects with required data.

Defining Blocks

The next step is to define the objects you are going to capture data from, called "blocks". Blocks will processed in FormReader the same way as usual fixed-forms exported blocks. In the beginning there is enough to name the blocks only, later the list of blocks can be easily modified abd each block should be related to one or several elements. For example you can define the first and second name elements separately and then combine them into one exported block. There is a way to describe any complex relation between an element and the blocks.

Defining Elements

As soon as you define blocks, which are the objects that should be found, you can proceed with creating "elements", which are the objects that help to define the blocks. This step requires some experience, because there are several possibilities for selecting the correct sequence of elements to be detected, defining corresponding elements, combining elements into compound elements. Comfortable FlexiCapture user interface enables the user to try and investigate different possible scenarios without too much effort.

The elements are created by defining their properties There are several pre-defined types of elements that help to simplify the process: text, table, barcode, phone, currency, picture. Each element type has corresponded set of properties. For each element the user specifies the coordinates of the absolute search area or, alternatively, the search area relative to other objects. The user also specifies the quantitative and qualitative characteristics of each element and allowed or forbidden characters for text blocks. Mutual relationships between the objects can be established (e.g. object A is always to be found to the left of object B, object A is the closest object to object B, compound object A consists of objects B and C). Errors in the definitions of the elements are detected automatically and appropriate warning messages are displayed.

The elements can be defined elsewhere in the hierarchy and can be grouped into compound elements

The created FlexiLayout is then tested to check that the required fields are found on all the sample forms.

The program uses the FlexiLayout to detect several possible locations of the objects. The maximum number of possible locations for each object is determined by the user. For each possible location, the program advances a hypothesis, i.e. an assumption that the detected object corresponds to a specific element described in the FlexiLayout. The degree of correspondence between the detected object and its FlexiLayout counterpart determines the quality of the hypothesis. The quality coefficient lies within the range from 0 to 1. If the detected object does not meet certain conditions imposed by the FlexiLayout, the quality of the hypothesis is downgraded by a certain number of penalty points. Then the program chooses the hypothesis with the highest quality coefficient. If this hypothesis selects the right object, the user starts describing the other blocks and elements.

As a result, the user will have a FlexiLayout for the first selected form image which tells the program what kinds of blocks can be found on the form and how to look for them relying on their neighboring objects. If the created FlexiLayout perfectly matches with the fields on the first image, it can be tested on all the other sample images.

Matching, Testing and Tuning the FlexiLayout

At any time during FlexiLayout creation it may be matched to the sample images for checking the results. It is important to test the FlexiLayout from the very beginning, when the first element is created. It allows to avoid accumulation of mistakes from an element to an element.

FlexiCapture Studio provides powerful visualisation interface for testing a FlexiLayout, so-called "hypotheses tree". The hypotheses tree visualizes the decision making process during FlexiLayout matching and allows to analyze hypotheses to determine the reasons of possible mistakes.

Hypothesis is an assumption that the detected object corresponds to a specific element defined in the FlexiLayout. Each element can have several hypotheses with different quality, which is the degree of the correspondence. When several elements are defined in a FlexiLayour the hypotheses of the elements are organized in a tree.

 

Each branch in the tree shows consolidated hypothesis for all elements. Qualities of element hypotheses in the branch are multiplied. The best branch in the hypotheses tree is the one having the best resulting quality. The smaller the hypotheses tree, the better the FlexiLayout.

"Reference layout" is another good tool used during fine-tuning process. When making modifications, it is important to make sure that you do not impair your current work by degrading the existing results. To preserve the integrity of existing results, you must save a reference layout. For each sample image page you may save the layout of the objects with which you are satisfied, and, after making some modifications to the elements, you may trace the differences from the reference layout that appeared as a result of your modifications.

FlexiLayout testing and tuning should be applied to all sample images.

 

Exporting the FlexiLayout

Once the FlexiLayout has been tested and tuned, it may be exported as a file that can be imported into ABBYY FormReader or application based on FineReader Engine 7.1 SDK. These programs use the FlexiLayout to identify corresponded semi-structured documents and capture data from them.

If you are interested in more detailed information about FlexiCapture Technology please fill out the request form to get White Paper about ABBYY FlexiCapture Technology (in PDF format).

 
©2003-2007 Scanner One, LLC
Website Maintenance & Site Marketing by Mile Hi Web Design
search scanners login to your account view basket contents