Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that hard copy data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.
In the broadest sense, forms processing systems can range from the processing of small application forms to large scale survey forms with multiple pages. There are several common issues involved in forms processing when done manually. These are a lot of tedious human efforts put in, the data keyed in by the user may result in typos, and many hours of labor result from this lengthy process. If the forms are processed using computer software driven applications these common issues can be resolved and minimized to great extent. Most methods for forms processing address the following areas.
This method of data processing involves human operators keying in data found on the form. The manual process of data entry has many disadvantages in speed, accuracy and cost. Based on average professional typist speeds of 50 to 80 wpm, one could generously estimate about two hundred pages per hour for forms with fifteen one-word fields (not counting the time for reading and sorting pages). In contrast, modern commercial scanners can scan and digitize up to 200 pages per minute.[1] The second major disadvantage to manual data entry is the likelihood of typographical errors. When factoring in the cost of labor and working space, manual data entry is a very inefficient process.
This method can automate data processing by using pre-defined templates and configurations. A template in this case, would be a map of the document, detailing where the data fields are located within the form or document. As compared to the manual data entry process, automatic form input systems are preferable, since they help reduce the problems faced during manual data processing.
Automatic form input systems use different types of recognition methods such as optical character recognition (OCR) for machine print, optical mark reading (OMR) for check/mark sense boxes, bar code recognition (BCR) for barcodes, and intelligent character recognition (ICR) for hand print.
With automated form processing system technology users are able to process documents from their scanned images into a computer readable format such as ANSI, XML, CSV, PDF or input directly into a database.
Forms Processing has developed beyond basic capture of the data. Forms processing not only encompasses a recognition process but also helps manage the complete life cycle of documents which starts from scanning of the document to the extraction of the data, and often to delivery into a back-end system. In some cases it may also include processing or generating well formatted results through calculations and analysis. An automated forms processing system can be valuable if there is a need to process hundreds or thousands of images every day.
The first step in understanding automated forms processing is to analyze the type of form from which the extraction of data is desired. Forms can be classified as one of two high level categories for the purpose of extracting data. Four categories have been proposed[2] however the document capture industry has settled up these two:
Although the components (described below) used for the extraction of data from either type of form is the same the way in which these are applied varies considerably based upon the type of document.
Various components included in data processing using automatic form-input system include
OCR recognizes machine-printed uppercase/lowercase alphabetic, numeric, accented characters, many currency symbols, digits, arithmetic symbols, expanded punctuation characters and more.
ICR recognizes hand-printed American and European English characters using pre-defined character sets: uppercase, lowercase, mixed case alphabetic, digits, currency (including $ (dollar), ¢ (cent) € (Euro) £ (pound), ¥ (Yen)), arithmetic and punctuation characters (including period, comma, single quote, double quote, ! & ? @ \ # % * + – / : ; < = >)
MICR is recognition technology to facilitate the processing of the MICR fonts of cheques. This minimizes chances of errors in clearing of cheques. It is also useful for easier and faster transfer of funds. MICR provides a secure, high-speed method of scanning and processing information.
Optical Mark Recognition (OMR) identifies bubbles filled in by hand or check boxes on printed forms. Usually OMR supports single and multiple mark recognition. The fields to be recognized can be specified as grids (rows by columns) or single bubbles.
Barcode Recognition can read more than 20 industry 1D and 2D barcodes including Code39, CODABAR, Interleaved 2 of 5, Code93 and more. It automatically detects all barcodes in an image or specified area within the image.
The process of automated forms processing typically includes the following steps:
Though automated forms processing has many great advantages over manual data entry, it still comes with some limitations. To achieve the best accuracy, some prerequisites should be followed.
One very important consideration is indexing, determining the metadata that will be used to describe the data contained within the documents. This attribute perhaps drives the forms processing solution more than any other.