Best practices and recommendations
This chapter provides best practices and recommendations for building or administering an IBM Production Imaging Edition solution or a pure IBM Datacap Taskmaster Capture (Taskmaster) system.
This chapter includes the following sections:
In addition, read Chapter 5, “Designing a production imaging system” on page 139, which offers guidance for designing successful production imaging systems.
9.1 Basic form design and capture
The topic of form design is addressed in various online articles and is beyond the scope of this IBM Redbooks publication. However, this chapter highlights some of the functionality that Taskmaster has available to assist in ensuring good form design. This chapter also highlights general guidelines to follow.
PaperGray font
Datacap ships with its own font called PaperGray. This font is in the C:Datacapsupportfonts directory. Figure 9-1 shows a sample that uses speckled dots to form constrained text boxes.
Figure 9-1 PaperGray font
Forms that are designed with this font allow segregation of handwritten characters without the use of lines, which can interfere with captured text. This font allows, through use of the image enhance despeckle function, the ability to remove the boxes easily and effectively with minimal impact to the written characters. Although the use of lines enforces character separation, the line removal process, in some situations, can affect the written characters that we are attempting to capture.
Figure 9-2 shows an example of a form that uses the PaperGray font for constrained text boxes. Notice how the boxes have been removed with minimal impact to the text.
Figure 9-2 PaperGray constrained text boxes
Barcodes
Use dimensional (1D) and two dimensional (2D) barcodes where possible. These barcodes aid in identifying a document. They can also carry a large amount of data and, in some situations, all the data you need from the form.
Ideally barcodes must be printed or attached so that they are square with the page. Barcodes that are attached at extreme angles can be difficult to capture.
Colors
Use of color helps to create appealing looking forms. However, the colors must be of a specific range that scanners can drop out (that is, removal of the constrained text box lines that we need).
Most color scanners have a drop-out color that you can specify, which is red or green. These scanners can perform almost the same output as the PaperGray font after despeckling.
Some scanners can produce two images simultaneously. One image is a color image that you can use for export. The other image is a bi-tonal (black and white) TIFF image, with the color removed that we can use for processing.
Always test the color that you want to use before you print large quantities of forms to ensure successful drop out.
Use of colored paper for forms can also affect scanning quality.
Fonts
Ideally use a 10–14 sized font to capture data. Smaller or larger fonts can start to cause issues with the Optical Character Recognition (OCR) engines.
Resolution
The resolution a form that is scanned in can determine the quality of the OCR, Intelligent Character Recognition (ICR), or Optical Mark Recognition (OMR) results. A low-resolution image can make some characters illegible to the OCR, ICR, and OMR engines and cause low confidence or incorrect reads. A higher resolution, although better quality, can pick up additional marks on the form, increasing the number of incorrect reads. This resolution also increases the size of the image that is being stored.
Determine the resolution on a case-by-case basis. However, a general rule is to use 200–300 dpi. Always use at least 300 dpi for OCR/A.
Layout
Use of constrained text boxes for handwritten recognition is important in establishing good results. The text boxes help to define the area where text will reside, the number of characters expected, and potentially the type of character, that is numeric or alphabetic. It also defines the size of characters that is required. These text boxes must be of adequate size so that the person who is completing the form can write legibly. Use the PaperGray font to create constrained boxes (see “PaperGray font” on page 290).
Try to get the person who is completing the form to use black ink and to write in clear, well-formed uppercase characters were possible. Therefore, include instructions in a noticeable area of the form that advise the person completing the form to follow these guidelines. These guidelines can assist in improving the accuracy of the ICR engine.
Ensure that OMR check boxes are of adequate size. Check boxes must not be too small and so close that the person who completing the form selects multiple check boxes. The check boxes must not be too large so that the person only selects a small portion of the box.
Where possible, do not to place constrained text boxes or OMR fields close to the edge of the form. When a form is scanned, a slight misalignment can lead to parts of the image not scanning correctly, resulting in a loss of data.
Constrained text boxes that contain hint characters can also cause issues when scanning if the hint character is not properly removed. Figure 9-3 shows an example along with an alternative method using the PaperGray font.
Figure 9-3 Solid and PaperGray constrained boxes
Scanning
To obtain good capture results, use a suitable scanner. When scanning from multiple scanners, the results can differ. The quality from one scanner can be worse than the quality of other scanners. Poor scanning quality can lead to poorly recognized documents. Therefore, make sure that scanners undergo routine maintenance to ensure that they are all working optimally and are not outputting poor quality images.
As indicated earlier, try to not use color on the forms. Although this practice is preferred, certain colors that can be dropped out are permissible. Some scanners can drop out colors at scan time, meaning that they never make it to the original image.
Separator sheets
When using barcode separator sheets, print the separator sheets on lightly colored paper. By using lightly colored paper, they can be easily removed from the scanned batches and reused.
Test any colored paper that you use for separator sheets to ensure that the background appears white. A light blue or yellow color works well with most scanners.
If you are using both document and attachment separator sheets, use of a different color for each sheet allows for easier sorting upon separator removal.
Always use the first generation or barcode separator sheets. Repeated photocopying of the original causes eventual degradation of the barcode to the point where it might become unreadable.
9.1.1 Scanned document verification
You must understand the nature of verification. First, use of Taskmaster does not remove the need for manual verification. You might have to verify a high percentage of the documents that you process. The improvement that you see, however, is increased throughput of documents that might otherwise require manual entry.
For example, typing data from every form requires a user to manually enter all information. Instead, use OCR or ICR to capture this data automatically and identify only the fields that it believes it has had issues recognizing correctly. The advantage is that the verifier, although it might have been seeing a high percentage of the pages scanned and recognized, only has to verify or correct a few the fields on the form. This process drive efficiency in the following ways:
You need fewer people to verify the same amount of data as you need for manual entry, reducing head count.
You can verify more data with the same number of people, increasing your throughput without increasing head count.
Typically, you get manually completed forms that are, for the most part, completed well (text in uppercase letters, in black ink, and inside the constrained boxes). Alternatively, you get manually completed forms that are, for the most part, completed badly (text in non-black ink, in mixed case, and merged between constrained boxes). As a result, a high percentage of the forms that are completed well go through with only one or two corrections or verifications necessary. The poorly completed forms have a larger number of fields to correct or verify. Therefore, good form design is key to reducing the number of poorly completed forms and thus the amount of verification time.
The Taskmaster thick client verification panel allows usage of keyboard shortcuts with mouse operation. By learning the specific hot keys and shortcuts in this environment, you can obtain greater efficiency in using the system.
Use capture snippets to reduce the need of the operator to search the form for the data to collect. Set these snippets to a size that is suitable for the operator to read easily.
9.1.2 Measuring scan and capture process improvement
The improvement obtained by using Taskmaster can be calculated as the time to process a set of documents with Taskmaster versus the time to process them manually. If you are more productive when using Taskmaster, with less manual intervention, you have succeeded.
You must understand that you are not removing manual intervention. Some form of verification or manual check might be required. Therefore, you are making users more productive, increasing throughput, or achieving both goals.
9.2 Best practices for application development
This section includes some of the best practices and recommendations for application development that we gathered from practitioners in the field. This information is not complete, but provides pointers and guidance based on the real-life experiences of consultants in the field.
9.2.1 Testing an application
When developing and tweaking a system, use VScan to start your batch for the following reasons:
When images are scanned into a workflow, they can behave differently based on how they were scanned. If you are adjusting a system to improve recognition, place some problem images in your VScan input directory, and then run the same images over again. If you scan the same images for each run, the differences in the scanning might be responsible for improving your recognition rather than your adjustments.
By using VScan, you can quickly create a batch, which negates the time-consuming process of running a set of documents through a scanner.
In many environments, such as test and production environments, you might need to start logging.
To turn on logging for a specific task, complete these steps:
1. Open the Taskmaster Client.
2. Log in to the application for which you want to turn on logging.
3. Select Settings → Workflow to display the Taskmaster Administrator.
4. Select the task on which you want to invoke logging, for example, VScan. Click the Setup button to the right.
5. In the Setup window, select File → Task Settings.
6. In the Task Settings window, click the Log tab. On this page, you can configure the logging levels and file location.
Logs can usually provide a good indication of any issues that are occurring in the system. However, ensure that you set the logs according to the system that is being used. If logging is set too high, it can affect performance.
Keep the flush buffer turned off, unless the batch stops and the log is terminated prematurely.
9.2.2 Capturing data
Define your output requirements from the start. Do no look at the form, and then start to decide what data to extract half way through building your system.
Look at the structure of the system to which you are exporting. What does it require? If this step is not done correctly at the beginning of the process, you can end up doing a lot of work in collecting data with no place to store it after you collect it. You might also omit pieces of data and then have to add them to the project later, increasing the time to deliver. Although this task can be done, knowing what you need up front is much more efficient.
You can capture OCR data from a document by using one of the following methods:
Zone Defining an area of the form where the data to be captured is located.
Keyword Using a keyword and then capturing the adjacent text, or using the presence of this keyword to help define the page type.
Regular expression Use of a regular expression to find data in a particular format such as date, ZIP code or postal code, and so on.
Operator input Use of an operator highlighting the text to be entered or entering it manually.
For each piece of data that you are asked to collect, decide which method or methods you can use. For data that is not on the form, you can also use database lookups, web services, or other methods to find the data.
9.2.3 Smart parameters
Avoid coding paths as parameters in your actions and scripts. Always read paths from the Application Service or the settings.ini file. This way, your application can be easily ported from one machine to another.
Learn to use smart parameters, and use the rr_ actions from the Rulerunner library to perform various tasks. You can greatly reduce the number of actions that you have to master by using the rr_ alternatives. For example, rr_Compare(0,1) is a good substitute for the ReturnFalse action. rr_Set(Value,@F) is a good substitute for DefaultValue(Value). The rr_ actions can perform the same tasks as dozens of other actions and you do not have to learn as many.
9.2.4 Projects
Always copy the foundation and demo applications (Accounts Payable Capture, Flex, MClaims, and so on) before you change them for your own use. If you change the applications in the build, reinstalling them can cause you to lose your changes.
Start new projects with a project that you are familiar with. Altering an application is easier than starting one from scratch.
9.2.5 Actions
Never change an RRX file in the RRX directory. The RRX file can be overwritten with reinstallation. If you want to change one of RRX files, copy the file to your rules folder in your application. If the same name of an rrx library is in your rules folder and the RRX folder, the one from the rules folder is loaded.
When writing actions, do not use the same name as another action in another library. Because libraries are loaded at different times during a Rulerunner session, you cannot be sure which action is being called if two libraries are loaded with actions with the same names. The .Net libraries use a namespace at run time. Therefore, the same name can safely exist in the .Net action libraries.
9.2.6 Scripting
You must learn how to write scripts in Taskmaster. The Taskmaster product is flexible and complete, but the included actions do not include the capability to do everything. Therefore by using scripting, you can add additional functionality to the product that was not originally included.
A typical script might take an existing action and add your own code to it to invoke additional logging, routines, or external APIs. The Taskmaster Capture scripting engine can interact with Component Object Model (COM) and .NET objects to take advantage of additional functionality and APIs that are not available with the product. A good example is writing a custom export script and .NET objects to export to an external application.
Because the scripting engine uses VBScript, you can code some portions of the code and then run it from your desktop, which is a good way to test snippets of code quickly outside of the Taskmaster environment.
9.2.7 OMR field configuration
Configuration for use of OMR fields requires use of log files as shown in the following example.
If set appropriately, the RecogOMRThreshold action writes the recognized values to the log file. By default, the log files are written to the batch directory. To find the start of the execution within the log file, search for the following line:
action RecogOMRThreshold
Under this line, you see output similar to the following line:
Box1 935,1616,991,1661 2520 411 16.3095238095238
This output indicates that the upper left corner of Box 1 is 936 pixels from the left border and 1616 pixels from the top. Also, the lower right corner is 991 pixels from the left border and 1661 pixels from the top. The box covers 2520 pixels. Then 411 pixels, or 16.3095238095238% of the pixels, in the box are black. Keep in mind that the scan was set to scan black and white. Therefore, 2079 (2520 - 411) pixels are white.
If you are unsure about how to set the parameters of the RecogOMRThreshold action, run the action with a sample image and set the parameters to any value. Set the log level appropriately, and then read the log file to obtain information similar to the previous information.
If you run this test across multiple OMR fields, multiple pages, or both, you can determine the average OMR threshold for each OMR field.
By using these values, you can determine the thresholds that are needed for the RecogOMRThreshold action.
Each task creates an XML file and writes the results of its actions in that file. Furthermore, the CreateDoc action creates a data file for each page. Its file name usually starts with the letters “tm,” followed by a series of numbers, and the .xml extension. In this file, you find information about all fields on the page. For an OMR field, a status of 0 indicates that the check box is selected, and a status of 1 indicates that the check box is not selected. See Figure 9-4 for a sample of this XML file.
Figure 9-4 OMR field used to detect if a signature box has been completed
The DensityString field shows the density percentage of the box. If four OMR boxes are available, the density string contains four characters (the same number of characters as the boxes you define). You can compute the density by subtracting 48 from the ASCII code of each character.
For example, the density string for the four boxes shows 0B00. Then the first, third, and fourth boxes have a density of 0%, the character 0 is ASCII 48, and 48 minus 48 equals 0 (48 – 48 = 0). The density of the second box, as indicated by a B in the density string, is 18%. The character B is ASCII 66, and 66 minus 48 is 18 (66 – 48 = 18).
9.3 Production Imaging Edition implementation principles
A full IBM Production Imaging Edition implementation cannot be delivered without a fair amount of thought, planning, and experience. The Production Imaging Edition bundle is new, but its components have been in existence for a fair amount of time, and methodologies for their deployment have been developed. Also, best practices are described in the product documentation and can be delivered by IBM Lab Services.
Given what was said about use cases, and the efficiency gains to expect, you can use the following guiding principles to help you approach a Production Imaging Edition implementation:
Examine the end-to-end business process that is associated to the Production Imaging Edition documents. Try to see what data you need at each step and for whom or what system or process. The idea is to gain an idea about what data is going to be needed, how to get it, and how it is going to be used. You must know what to look for in your documents and external systems.
For example, if we want to process insurance claims, we need to ensure that we can extract insurance policy numbers, customer names, and so on, and check them in the claim administration system. We also need additional documents and pieces of information down the line, such as police reports, damage quotations, and so on.
We must understand how we intend to reconcile the supporting documents with the claim document that drives the process. For example, do we need to send back correspondence with a unique barcode tied to the claim number, so that the claimant can use it when sending the requested documents back?
Capture as much data as possible from the business artifacts, that is, from the documents that you have. You want the data to help you make sound decisions, and you need to extract it from the images to make it usable. You do not want to have to search for the data in images and re-enter it manually to do routing, calculations, and so on.
Transfer as much relevant information as possible to the Production Imaging Edition workflow. Similarly you want the postcommittal workflow to drive the business process. You also want it to use and to serve as much relevant information as possible to the users or other systems. Therefore, you must pay a lot of attention to designing the data model that is used in the workflow and knowing what data from Taskmaster batches and documents can be fed into it.
Normalize the data as early as possible in the process. For example, standardize date formats to help with date calculations, and verify the correct spelling of customer names, vendor names, and so on. Taskmaster and its lookup and verification capabilities are designed for that purpose.
As early as possible in the process, integrate with your business systems. The customers, vendors, or product nomenclatures that you use in the business process are likely referenced in an SAP system or other type of business databases. The sooner you can check and validate the data that you have and make them conform to the business systems, the better. It helps everyone down the line and reduces errors. You can use Taskmaster and Business Process Manager, possibly with component integrator, to help in this task.
Get documents into the FileNet P8 repository as fast as possible, so that you can realize the benefits associated with the processing of the business documents sooner. Therefore, you must automate the capture process as much as possible. You must look at the types of documents and the forms you are using and then try to standardize them as much as possible, possibly by the use of barcodes.
You must also see what the fastest entry point in the system is, whether it is at a local level or at the central level and who is best placed to supply the data that is needed. This task might mean distributing the capture process to where the documents originate and might be consumed.
When you understand what you want to achieve and have described it on paper, you can start implementing a sandbox system. The idea is to get a test system running quickly with all the major components. Then start setting up the system for one selected business process and its documents and try to implement it end to end.
Typically you want to start setting up the Content repository first because it is the heart of the system. You start with a set of document classes that will be used in a workflow or a folder structure, by the users in security, storage areas, and so on. All of this work is done on the strength of the analysis of the business needs that has been done beforehand.
Then after you define the document classes and the metadata that you need to process the documents, you can configure Taskmaster to capture the documents and metadata that you expect. More likely the analysis of the type of data that can be produced from the documents in Taskmaster and the definition of the document classes in Content Manager will be an iterative process.
By configuring Taskmaster, the following tasks will occur as explained:
Creating a Taskmaster application that is dedicated to the business process in question and creating the document hierarchy, the data fields, and zones
Configuring Taskmaster tasks and workflow, rule sets and actions, lookups, database feeds, and so on
Designing Taskmaster windows and dialog boxes for scan, verify, and data entry tasks
Creating users and groups and configuring the functional security that applies to them
Configuring the release stage in Taskmaster to commit the documents and the metadata that has been captured to Content Manager and to the workflow
Configuring the reporting, activity monitoring, and notification
After a fair amount of testing, iterations, and adjustments on both the Taskmaster and Content Manager, you can start configuring the Production Imaging Edition workflow.
Typically, designing a workflow requires you to look at current business practices. You want to build it as being a better, more efficient way of doing things. Therefore, you must consult with business people over multiple iterations to get the process right.
To help in that process, you use the Business Process Manager Tools, such as the Process Designer and Visio, which can quickly and effectively produce the workflow maps needed. By using the Process Designer, you can create and start testing the workflows. You are likely to need a fair amount of testing and simulations to reach a point where you feel confident that you can start loading the system and do some real-life testing and analysis with Process Analyzer.
Obviously you need to consider many more aspects in an implementation, including integrations with business systems that will likely require the Component Integrator.
Before you can place the system into production, you must perform a fair amount of testing and make the necessary adjustments under various load conditions to achieve the expected results. Bottlenecks might appear between the various components of the system and require remedy. To address bottlenecks, both Taskmaster and Content Manager offer reporting tools that can help to identify them.
 
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset