Chapter 11. Technical walkthrough Accounts Payable Capture

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Technical walkthrough Accounts Payable Capture

This chapter guides you through the dynamic IBM Taskmaster Accounts Payable Capture application. This application is called a foundation application, because it is used as a starting point for capturing complex machine printed forms with line items in a dynamic fashion. This chapter describes Accounts Payable Capture as an example of using the dynamic technologies introduced in Chapter 10, “Dynamic technologies” on page 305. Before you read this chapter, you must read Chapter 10 to become familiar with the technologies as they are use in Accounts Payable Capture.

This chapter includes the following sections:

•Introduction to Accounts Payable Capture

•How IBM Taskmaster Accounts Payable Capture works

•Jobs available in the workflow

•When each task profile gets executed

•A walkthrough of the task profiles

Accounts Payable Capture versus APT: This book uses the name Accounts Payable Capture to refer to what was previously known as APT. Keep in mind that some windows in the application and window captures in the book might still use the name APT.

11.1 Introduction to Accounts Payable Capture

The IBM Taskmaster Accounts Payable Capture application is a learning workflow. It uses dynamic technologies to learn new instances of the documents that are introduced in the system. The Accounts Payable Capture workflow has been used for many types of documents and applications, not just invoices.

In this chapter, you look at the role that IBM Taskmaster Accounts Payable Capture plays in the Accounts Payable (AP) industry. You are introduced to the different jobs in the workflow and examine the task profiles, rule sets, rules, functions, and actions that make it work.

Because Accounts Payable Capture is constantly evolving with new technologies and techniques, your version might differ from what is shown in this chapter. However most of the techniques are applicable to any version. After you have a thorough understanding of how the technologies interact, you will be in a better position to apply them to any product where you need similar capability.

11.2 How IBM Taskmaster Accounts Payable Capture works

IBM Taskmaster Accounts Payable Capture captures the data from invoices and validates that data with business rules and with the business applications of the customer. Then it exports the image to an image repository and the data to the business applications.

In a typical application, a purchase order (PO) is entered into a PO system. After approval, the PO is sent to a vendor for fulfillment. The vendor ships the goods and a shipping document.

Then the customer receives the goods and enters the shipping document into the business application as a Proof of Delivery (PoD). Next, the vendor sends an invoice, which is received by the Accounts Payable department. The AP department uses IBM Taskmaster Accounts Payable Capture to capture the invoice. Accounts Payable Capture interacts with the business application system to obtain the most accurate data possible.

Next, the data is inserted into the business application where a three-way match occurs. In this process, the PO line items, the PoD line items, and the invoice line items are compared. When they match and the customer is confident that they ordered and received the goods, the business application system issues a check to the vendor.

Figure 11-1 illustrates this process in a simple diagram.

Figure 11-1 A diagram of the AP process

11.3 Jobs available in the workflow

You might notice many different jobs in the Accounts Payable Capture workflow. Although all of these jobs share common elements, they are different in some way.

Figure 11-2 shows a list of the jobs that are available in a current version of IBM Taskmaster Accounts Payable Capture.

Figure 11-2 Many different jobs in Accounts Payable Capture workflow

Jobs that start with the term Web use web scanning and upload, but the tasks that follow are identical to their thick-client workflow counterparts.

Jobs listed as Demo use a virtual scanning technique, but jobs listed as Main are used to drive a physical scanner. You cannot set up the thick client tasks without a scanner and drivers attached to the machine. By default, the IScan task and module are not present. The jobs begin with Batch Profiler, but are intended for you to add a physical Scan task as the batch creation task and place it as the first task in the job.

The job or jobs that end with FlexID have an optional FlexID task before Batch Profiler to manually identify key pages of the batch that are necessary for page identification. For more information, see 10.3, “FlexID” on page 313.

The jobs that end with -Dot Matrix are identical to their counterparts in all respects except for the name of the job. At run time, during the recognition phase, the current job name is queried. If the job ends with -Dot Matrix, the recognition engine is programmatically configured for dot-matrix recognition through the use of rules.

The job or jobs that end with -Multipage TIFF are also identical to their counterpart-based jobs, except for the name. The current job name is queried during run time to programmatically enable rules that will do page identification based on the structure of the multipage TIFF files that were used for input.

This technique of using identical jobs with different names is useful in applications you develop where minor changes are in the rules execution that you want to enable or disable by placing the images into different jobs.

11.4 When each task profile gets executed

Tasks generally run task profiles, but APT (now Accounts Payable Capture) pioneered the use of running task profiles on demand from a verify panel. This process allows far more power in the Verify tasks without scripting them in a significant way. By using rules, which you run by clicking a button in a Verify panel, you can reuse routines multiple places in the execution of your application. You can also have a single-source code base for much of your application. In general, calling rules from a Verify panel is superior to coding and maintaining the same functionality in the panels themselves.

Figure 11-3 shows the task profiles uses within a typical Accounts Payable Capture workflow.

Figure 11-3 Task profiles in Accounts Payable Capture

The first four task profiles are run by tasks. Except for the Validate task, the task profiles are named the same as the tasks that call them. The Validate task is called to validate your document from a data entry panel during the Verify task.

FingerprintAdd is called when you add a fingerprint manually in the DStudio Zones tab. With an Accounts Payable Capture workflow, you do not do this task often, but it is possible. Accounts Payable Capture workflows can add fingerprints to the system automatically and with greater accuracy to the line item zones than is possible with the manual interface.

StickyFingerprints (see 10.5, “Sticky fingerprints” on page 317) is called from the Verify panels when the Verify panel detects a <New> fingerprint that it has learned zones for, from a previous image in the batch.

CalculateBlank and FindDetails run from buttons on the various Verify panels.

ImageFix is called when setting up the image settings on the Zones tab of Datacap Studio.

11.5 A walkthrough of the task profiles

The remainder of this chapter takes you through the task profiles of the Accounts Payable Capture workflow, as used in IBM Taskmaster Accounts Payable Capture at the time of this writing. This walkthrough is provided as an example of how to use the dynamic technologies in an application as explained in Chapter 10, “Dynamic technologies” on page 305.

Before you attempt the walkthrough, you must understand the concepts of rule sets, rules, functions, actions; how they are attached to the Document Hierarchy (DCO); and how they are run. For information about these concepts, read Chapter 10, “Dynamic technologies” on page 305.

This section highlights the following task profiles:

•The VScan Task Profile

•The Batch Profiler Task Profile

•The Verification process

•The Export Task Profile

11.5.1 The VScan Task Profile

The VScan Task Profile brings documents into the system programmatically and without user intervention. Other methods are possible for bringing documents into the system in an Accounts Payable Capture workflow. In these methods, the jobs listed as Main are generally devoted to driving a scanner. Also with the Web demo VScan task, users can choose the images that are to be used as input from a directory listing.

Other Accounts Payable Capture jobs that are common, but not in the foundation application at the time of this writing, are tasks that pull documents from an email system or a fax server.

Tweaking your application by using the VScan rule set: Set up, develop, and tweak your applications by using the VScan rule set. When tweaking the system for optimal performance, you will often find it advantageous to run the same images over again so that you can tell if your modifications have improved the process. This approach is preferred to running a subsequent batch that is scanned a little straighter.

The VScan Task Profile runs in the demo jobs that are not designated as Web in the Accounts Payable Capture workflow (Figure 11-4).

Figure 11-4 The VScan Task Profile

By default, the VScan Task Profile only contains one rule set, which is also called VScan. It is common to add another rule set to this task profile if your input images are PDF, JPEG, Grayscale, Color, or another image type that must be converted. It is advantageous to have all images that are finishing this task profile as bi-tonal TIFF images. If you place the convert actions first in the Batch Profiler Task Profile, rolling back batches to the start of Batch Profiler becomes an issue. The images try to go through the conversion process again.

The VScan rule set

The VScan rule set is typical for a system that has single-page or multipage TIFF files in the input directory (Figure 11-5).

Figure 11-5 The VScan rule set

As shown in Figure 11-5, SetSourceDirectory indicates that VScan must look for images in an Input subfolder off the folder that was specified in the vscanimagedir tag of Application Service. It is preset to accept a maximum of 50 images and to burst multipage TIFF files into single-page TIFF files if they are pulled in.

11.5.2 The Batch Profiler Task Profile

The Batch Profiler Task Profile is the main background processing profile. This task profile used in every job in the Accounts Payable Capture workflow and performs all processing between image acquisition into the system and the verify process.

A few of the rule sets, such as Purchase Order Line Reconciliation (POLR) and VendorNumberLookup, are specific to invoice processing. They might not be needed in other learning applications.

Figure 11-6 shows the rule set composition of the Batch Profiler Task Profile.

Figure 11-6 The Batch Profiler Task Profile

The PageID rule set

Accounts Payable Capture is the first workflow to use the Advanced PageID actions that are described in 10.2, “PageID actions and techniques in dynamic applications” on page 307. These actions are included in PageID.rrx file that is in the rules directory of Accounts Payable Capture. These actions can be copied to any application that needs this capability.

Figure 11-7 shows that the implementation of these actions is in the PageID rule set.

Figure 11-7 The PageID rule set

Only one rule is in this rule set, and it is attached at the batch level. When the DCO enters this rule set, every image in the batch has only a batch-level object with a child page of type Other. When VScan runs, it places the ScanSrcPath variable on each page that lists the source TIFF that the image came from. In a case where VScan breaks out images from a multipage TIFF or PDF file, several images in a row can have the same ScanSrcPath variable.

In the first Multipage Origin() function, the first action, the Is_JobName() action, checks which job name you chose at scan time. If the action checking the job name returns true, the PageIDByVariableChange() action runs, querying the ScanSrcPath on each image and naming the pages.

When the ScanSrcPath changes from one image to the next, the image with the new ScanSrcPath value is named according to the second parameter of the PageIDByVariableChange() action. In this case, it sets such page type as Main_Page. For every subsequent page in the batch that contains the same value in ScanSrcPath, the images get the type specified in the third parameter, which is Trailing_Page in this case.

If you are not running the demo Mulitipage TIFF job, the first action, the Multipage Origin() action, returns false, and it runs the PageIDByBCSep() action. This action reads the settings.ini file and uses that information to set the types on all the pages. As mentioned in 10.2.1, “Page identification by barcode separator” on page 310, you only need barcode separators in multipaged documents in the batch. Single-paged documents are placed at the start of each batch and get the type specified in the second parameter, which is Main_Page in this case.

After PageID() runs, all pages must be named. If a page is not typed as Other going into the PageID, it is not renamed. The reason is that FlexID might have named the images before PageID runs, and PageID must not overwrite what an operator has specified as the type. However, if FlexID has run, the actions set the type on any Other page that is not renamed in FlexID, according to the LastPage_ThisPage section in the settings.ini file.

The ImageFix rule set

When ImageFix is used in the Accounts Payable Capture workflow, it follows PageID. With form-type applications, where fingerprinting is often used for the page identification, ImageFix runs on every page and before the fingerprint match is attempted.

With this type of workflow, with PageID before ImageFix, you can be more aggressive with the line removal process because the barcodes will have been read and processed. However, particularly for invoices, you need to be less aggressive on the despeckling process, because you need to keep the decimal points in the currency fields if possible.

ImageFix in the Accounts Payable Capture workflow also automatically rotates the images that it processes before the settings in imagefix.ini file are applied to the image (Figure 11-8).

Figure 11-8 The ImageFix rule set

RotateImage() creates a CCO file to do its work, unless one was already created. It uses the AnalyzeImage method of CCO creation, which uses the location of graphics, words, and lines. The CCO file is created before the image is deskewed. Therefore, you do not want to use this CCO file for fingerprint matching. You can create another one during the Recognize rule set.

The CreateDocs rule set

The CreateDocs rule set in an Accounts Payable Capture workflow is the same as it is in all workflows. At the batch level, you run the CreateDocs rule set, and at the page level on pages that contain fields, you run CreateFields.

Figure 11-9 on page 349 shows the variables that the CreateDocs rule set uses to create a document structure in the Accounts Payable Capture workflow. When using the CreateDocs rule set on the Main_Page object, it always sets it as the first page of a new document called the Invoice document. The Invoice document can contain all of the other page types that are set in PageID, except for a Document_Separator page. Those pages are placed in a Separator document when found. No additional processing is done on Separator documents in Accounts Payable Capture. Therefore, the process effectively culls out the Document_Separator images from the batch, without removing them, leaving your audit trail intact.

Figure 11-9 Accounts Payable Capture setup DCO variables for Min, Max, and Order

The AdjustFields rule set

The AdjustFields rule set creates a document structure called Browse so that users can have a structure that spans each page of the invoice, and you can browse between them when using Taskmaster Web. The the AdjustFields rule set creates a line-item child, called PageNo. PageNo has a subfield named TIFF that gets positions in the upper-left corner of each page and a value set to the current TIFF name that the field is placed on.

When you are in the Verify panel and you click a field, it automatically shows the page on which the field is found. If you need to move to a page to click data, the thin client does not have buttons to make such a move. Therefore, this browse structure places a field on every page that you can use to effectively move back and forth between pages of the invoice.

Figure 11-10 shows the implementation of the AdjustFields rule set. The Add Page_NoReferences runs on the Browse field on Main_Page.

Figure 11-10 The AdjustFields implementation

The Recognize rule set

In the Recognize rule set, some Job Specific actions occur. This rule set also provides an interesting technique for capturing statistics in the runtime DCO. For more information about, managed recognition, see 10.6, “Managed recognition” on page 321.

Figure 11-11 contains the rules, functions, and actions of the Recognize rule set used in Accounts Payable Capture. The single page-level rule is attached to the Main_Page and Trailing_Page objects. The Attachment pages are not recognized.

Figure 11-11 The Recognize rule set

The first function, the Recognize Dot Matrix function, always returns False. The first action, which is the rrSet() action, creates a batch-level variable called RecogType. The action sets its value to Normal. The second action, which is the Is_JobName() action, then checks the job name, which is Demo Dot Matrix in this case. If this name is not the current job name, the action fails, and the next function is immediately started. If the value is True, the action adds variables to the page and sets the value of those variables to inform the ScanSoft recognition engine later to use the dot-matrix settings. It then sets the batch-level RecogType variable to a value of Dot Matrix, and the last action, which is the rrCompare() action, then returns a value of False.

You might need to copy this function if you have more than one job that you want to use for Dot Matrix. You need a function before the Recognize Normal Pages (not dot matrix) function for every job that you add for Dot Matrix recognition. For example, if you want to add a Main Dot Matrix job, you copy the top function and place the copy above the Recognize Normal Pages function. Then you change the IsJobName parameter to the name of your new job.

The second function, which is the Recognize Normal Pages function in Figure 11-11 on page 351, shows the first attempt at managed recognition. It is set with the shorter timeouts.

It begins with the SetFingerprintRecogPriority() action set to TRUE. By setting this action, RecognizePageOCR_S replaces any existing CCO with a new CCO that is created by the recognition process. This action is necessary because RotateImage was used in the ImageFix rule set. It creates a CCO with the AnalyzeImage method and was created before the image was deskewed. This action essentially throws away the CCO and makes a new CCO based on where words and lines were recognized by the recognition engine. This action provides a more usable CCO for locating data than a CCO that was created with AnalyzeImage.

The RecogContinueOnFailue() action is then set to TRUE. This way, if a problem image is in the batch, the batch can continue after you detect this problem and reset the engine.

The next two actions, which are the SetOutOfProcessRecogTimeout() and SetEngineTimeout() actions, specify the timeouts to use. Figure 10-11 on page 322 shows an example of managed recognition. A monitored thread is created that has been set to automatically shut down in 25 seconds if recognition is unsuccessful. The recognition engine itself is created in that second thread and has its internal time-out set to 20 seconds. In most cases, a page is recognized quickly, usually 2 seconds or less. However, if a problem image is encountered, the recognition engine can detect this image by the time it takes to recognize it. Failing that test, the thread expires if the recognition engine hangs and is unable to monitor itself.

The RecognizePageOCR_S() action does the recognition. It uses variables that are created with the engine setup on the Zones tab and perhaps the Dot Matrix variables. It tries to recognize the page, write the CCO, and return a status.

If the status that it returns is 0, everything is successful. A text file is then created in that batch directory with the createTextFile() action. This text file is for observation and troubleshooting only. It is not used elsewhere in the process.

Finally, a batch-level variable is created or incremented with a special action called the IncrementBatchVariable() action. This action helps to capture statistics and place them at the top of the runtime page file so that you can quickly look at the page file. This technique is useful if you have several branches that your images can be processed with because it counts the number of times that a path is taken. If the variable does not exist, it is created with a value of 1. If the variable does exist, the value is incremented by 1.

The Managed OCR Failure function is a copy of the function before it tries recognition a second time with longer timeouts, but after resetting the engine. The IncrementBatchVar() action logs in the DCO when the managed failure occurs.

If the engine fails a second time, the Second Recognition Failure function deletes the document. However, if the recognition engine returns a value of 0 for RecogStatus, the Managed OCR Failure completes. If the recognition engine returns a value of 1 for RecogStatus, the recognition engine detects a blank page.

If the page is not blank, we have tried to recognize it twice, once with a 20 second timeout, and another time with a 3 minute timeout. In this case, mark the page for deletion, which also deletes the document it is in. This action does not delete the document. Instead, it marks the document so that you do not attempt to process it further. A well-designed system notifies someone about these deleted documents. Accounts Payable Capture does this notification in the Export task.

The FindFingerprint rule set

Fingerprinting on the Accounts Payable Capture workflow is only done on the Main_Page. The purpose of the Fingerprint rule set is for automatic identification of the specific vendor who sent the invoice. It also provides the offset that is needed to apply to zones to compensate for differences in the scanning positions.

If the Accounts Payable Capture software is being used to capture documents from tens of thousands of vendors, use the Fingerprint Service. The Fingerprint Service is a web server that stores the CCOs for all active fingerprints all the time. Without it, a background computer must load the CCOs from the network every time a batch is processed, which can add significant time to the background processing.

The role of the FindFingerprint rule set is to ensure that every Main_Page has a fingerprint TemplateID that you are working with. If no sufficient match is found with an existing fingerprint, a new fingerprint is created automatically. Any new fingerprints created this way do not have zones defined on them. Also, all data extracted must be by keyword or regular expression before a data entry operator views and processes the invoice. Later in the “The PreExport rule set” on page 379, we run Intellocate rules that save any zones that are identified for these new fingerprints. This way, the next time that this invoice is encountered, the data can be found by using zones.

Figure 11-12 shows the FindFingerprint rule set that is used in an Accounts Payable Capture workflow.

Figure 11-12 The FindFingerprint rule set

The batch-level rule sets the fingerprint directory from the App Service setting. The new fingerprints are stored in this directory if they are created. The paths to the existing fingerprints are in the Fingerprint database.

Moving the Fingerprint database from one system to another: With a learning system, such as Accounts Payable Capture, use care when moving the Fingerprint database from one system to another. You must ensure that the paths are correct and that you do not lose entries when moving.

Consider what might happen if you copy your entire system from production to development, work on the system in development for a few days, and then try to copy the entire system back. You might lose any new fingerprints that were added to the production system while the copy was in development. In general, do not move the fingerprint database from one system to another.

From the batch-level rule in Figure 11-12, the first function, which is called For commas as a decimal separator, checks the locale of the machine that is processing the fingerprints. It ensures that the floating point values from SetProblemValue and SetFingerprintSearchArea use the correct decimal separation character in their parameters.

Normally, we want to look at the top 30% or so of the current image to compare it against our fingerprint library for matching. The ProblemValue, by default, is set to 0.8. Decreasing this value increases the chances that the fingerprint mismatches a fingerprint from another vendor, creating fewer fingerprints in the system overall. Increasing this value gives a more precise match, creating creates more fingerprints in the system and additional work for data entry operators in zoning these additional new fingerprints.

Adjust this setting only after a discussion with the users about the effects that they can expect by changing this value. Often times, we find that a value of 0.8 provides the right balance.

If a mismatch occurs, the data entry operators can create a fingerprint with the click of a button, or in the case of the web, by choosing YES from the Add New Fingerprint list. If more than one fingerprint exceeds the ProblemValue setting, the fingerprint with the best match is returned by the FindFingerprint action.

The SetDCOType action ensures that the FindFingerprint action does not change the page type (which is set in the PageID rule set) to another value.

The Locate rule set

In the Locate rule set, the data is pulled from the CCO that was created in the Recognize rule set into the fields of the DCO. The rule set is extensive, but overall employs a few different techniques for retrieving the values. We explore each technique.

The Locate rule set also runs as part of two different Task Profiles, Batch Profiler and StickyFingerprints. Figure 11-13 shows how the Locate, Clean, Filter, and Validate rule sets are used during the StickyFingerprint Task Profile, which is called during the Verify process.

Figure 11-13 The StickyFingerprints Task Profile

Reusing rule sets this way reduces the maintenance of the system. However, it makes each rule set slightly more complex. Each rule set must test when it is being run so that it knows precisely how to handle its order of operations when processing or reprocessing a document. If you are unfamiliar with what StickyFingerprints are, see 10.5, “Sticky fingerprints” on page 317.

Figure 11-14 shows the document- and page-level rules of the Locate rule set.

Figure 11-14 Document and page-level rules in the Locate rule set

The document-level rule is in place to make a multi-CCO file (MCCO) that combines all of the CCOs from the Main_Page and any Trailing_Pages into one large CCO. This way, the entire document can be searched for data at one time. For more information about this technology, see 10.7, “CCO Merging” on page 323.

When making an MCCO, only do it once and only do it during the Batch Profiler Task Profile. For this reason, the check is in place on the document-level rule to see if the document is being reprocessed during the Verify Task. In the document-level rule, you can see that, if the batch is being reprocessed by Verify, it does not attempt to remake the CCO file. The MCCO technology adds the Trailing_Page CCO to the Main_Page CCO and replaces the CCO of the Main_Page. Therefore, running this action more than once has the effect of duplicating the data from the Trailing_Pages to the CCO and potentially doubling the data found when we search it.

In the first function on the document level, we check to see if we are running under the Verify Task. If we are, the function returns true and does not run the second function that creates the MCCO.

In the second function, the IsMultipageDocument() action determines if the document contains more than one page. If it does, it makes an MCCO for you.

For the page-level rule, we are trying to read any zones that are associated with the fingerprint if they exist. Similar to the document-level rule, we must check to see when the Locate rule set is running so that we not to reprocess documents that do not need it.

The first function checks to see if we are in a situation where StickyFingerprint technology is useful. See the StickyFingerprint Task Profile listed at the top of this section (see Figure 11-13 on page 355). The first rule set in the StickyFingerprint Task Profile sets a variable named Sticky that indicates whether we need to reprocess the runtime document with Locate rules. If the value of the Sticky variable is No, the SkipChildren() action causes the rule set to stop running for all children of the page, meaning that the fields will not be searched for data.

Regardless of whether Sticky is set to Yes or No, we do not want to attempt to read the zonal information when we run from the Verify Task. The reason is that the first rule set in the StickyFingerprint Task Profile also sets the zones based on the previous document in the batch. The net result of the first two rule sets is that, when running under StickyFingerprints, the zonal data is never read by the Locate rule set. Depending on whether the Sticky variable is set to Yes or No, the field-level actions in the Locate rule set will or will not run.

The remaining functions on the page-level rule must only run during the Batch Profiler Task Profile. Again, because there are two different places where zones can be stored (FPXML or the Setup DCO), we must check to see which actions are appropriate for reading the zonal information.

Reading zones with the FPXML method requires us to set the fingerprint directory where the FPXML files are kept. To read FPXML with detail lines defined, we must inform the ReadZonesFPX() action of a detail structure so that it can read and apply the detail line zones correctly. This process is done by using the SetDetailandLineItemPairFPX() action. You do not have to specify the Browse structure here. Such zones are previously set programmatically in the AdjustFields rule set.

FPXReadZones returns FALSE if it cannot find or read an FPXML file. If it is a multipaged document, set an EOL character for the MCCO to process correctly. If the FPXML is read, then a variable is set at the page level so that we know the method by which the zones are read. Because there are two actions that can return FALSE in this function, we must check this variable in the next function to see if the zones are read in correctly. If the ReadZonesFPX() action returns FALSE, the variable is not set, and it attempts to read zones from the Setup DCO.

The rest of the Locate rule set pulls data into the fields by using the following methods:

•Populating data that is normally best found zonally

•Finding data that floats around

•Searching for and populating the detail lines

Populating data that is normally best found zonally

With the first two methods, Accounts Payable Capture checks zonally and with keyword searches. The difference is the order of the methods that are used. Figure 11-15 on page 359 shows how we find data when zones are preferable for locating the data. We use this method when data does not tend to float around the form for these field types.

As with the previous rules in this rule set, we must check which task we are running under. Therefore, the first function checks to see if we are running under the Verify task. If we are, then it loads the CCO dynamically and attempts to populate the data from a zone with PopulateZNField(). If we are not running under the Verify task, the CCO is already loaded, and we can immediately run the PopluateZNField() action. If a zone is defined for the field and the zone contains data, the data in that position is pulled into the field. The rule for that field is finished, and we can move on to the next field.

If no zone is defined (as in a new fingerprint) or if no data is in the zone, we use Locate actions to try to find the data programmatically.

The first step in this process is to find a keyword or regular expression. Although regular expressions are a preferable method of finding data in general, most of what we capture from invoices is not unique enough in structure to justify finding by using a regular expression. The exception is a PO field. If a company uses a number that is fairly unique, such as the letters PO followed by 8 digits or something of that nature, consider replacing the default actions to find the PO by regular expression.

Figure 11-15 Finding data with the zonal method preferred

With the keyword search, a key file is placed in the dco_<AppName> directory listing the keywords that might be used as labels around the data that you are searching for. Accounts Payable Capture ships with a limited set of keywords in the .key files. Add additional keywords as you find appropriate.

After a keyword is found, we move one word to the right to see if that word contains data that is appropriate to the type of data we are looking for. If it does, we do an UpdateField action, and then we are done. The rest of the functions in the rule operate identically, but look at different words based on their location related to the keyword that was found.

This method is used for the bulk of the fields in a document where the data for that instance of a document is located in a static location (because we are checking for the zone first).

Finding data that floats around

Some data, such as the Invoice Total, Tax, and Shipping, are normally found on the last page of the document. Invoices have a variable number of pages. One day you might get an invoice from a vendor that has one or two line items and everything fits on a single page. The next day, you might get an invoice from the same vendor that has dozens or hundreds of line items and is many pages long.

For this reason, we use a different preferred method for data that typically falls below the last line item on an invoice (Figure 11-16).

Figure 11-16 Rule for finding data that tends to float

In this case, we look for the last occurrence of a word in a document and then look around that word for data that fits the data type we are searching for. If the word is not found, then we use the zone as a last resort.

Searching for and populating the detail lines

The technology involved in finding detail lines is explored in 10.9, “Line item detection” on page 325. However, more is involved in the actual implementation than just the actions that search the CCO for lines as illustrated in Figure 11-17.

Figure 11-17 Finding detail lines in the Locate rule set

As with some of the other rules in the Locate rule set, we must check if we are running in the Verify task to prevent reprocessing the document unnecessarily. The first function does this check. If you are running under the Verify task, the rule set does not use these actions to find detail lines.

The second function checks whether we are using FPXML and moves the FPXML zones to the detail structure. FPXML is different in its storage of zones than in the storage of the Setup DCO. If FPXML is used, we must load the CCO again. The next two actions set up the zone for the multipaged document, ensuring that the entire CCO is searched for line items, even though it contains a variable number of pages. Regardless of whether FPXML is used, after the zone is set up, ScanDetails creates the detail structure and puts each line item in its own child Lineitem field.

At the line-item level, we perform a ScanLineItemDynamic action, by using the recently loaded CCO if FPXML is used. Alternatively, we perform the ScanLineItem function if the zones were read from the Setup DCO.

Finally, each field on the line item is populated with PopulateZNLineItemField.

If you are unfamiliar with how this technology works, see 10.9, “Line item detection” on page 325.

The Clean rule set

The Clean rule set is used for a specific purpose on a limited number of fields in an Accounts Payable Capture workflow. Most field cleaning occurs in the Validate rule set. When a data entry operator clicks a field, we want to remove unwanted characters and set the data format in fields, such as dates, to a specified format.

However, some fields are used for data lookup before we validate them. Therefore, those fields must be cleaned and potentially formatted in this rule set. Also, we must check the data in certain line-item fields to ensure that it is the data that we are searching for before filtering the line items. If you are unfamiliar with this method, see 10.9.4, “Filtering line items” on page 330.

Figure 11-18 shows the default rule set for cleaning invoices.

Figure 11-18 The Clean rule set

The page-level rule indicates that the Clean rule set must do nothing if we are running under the StickyFingerprints rule set and the current fingerprint is not eligible for sticky processing. If the rule fails, then a first attempt is made to adjust the field positions from the MCCO to coordinates that are associated with each page. For more information, see Chapter 10, “Dynamic technologies” on page 305.

Because the ZIP field is used in a vendor lookup, the data is cleaned of any character that is not a number (or potentially a dash, if postal codes contain dashes in your database). This action ensures that we have removed any spaces or extraneous characters that might cause the lookup to fail.

In the line-item fields, we delete any unexpected data from those fields. For example, if a Qty field contains the word “Thank,” which it might have captured when reading the bottom of an invoice, this word is deleted. The same is done for Price and Line Total in the detail-level fields. For more descriptions about why this is done, see Chapter 10, “Dynamic technologies” on page 305.

Avoid the urge to clean fields in this rule set that are not used in lookups or for detailed line filtering. Otherwise, you have to do it again in the Validate rule set, which adds to maintenance if the cleaning parameters change.

The Filter rule set

The Filter rule set is used to delete the line items that are captured during the Locate or FindDetails rule sets that do not fit the data types we are expecting. For more information about this technology, see 10.9.4, “Filtering line items” on page 330.

The Filter rule set used for invoices in the standard Accounts Payable Capture workflow is shown in Figure 11-19.

Figure 11-19 The Filter rule set

Again, the page-level rule avoids reprocessing the line items if we are not in a condition where it is beneficial to do so when running under the StickyFingerprints Task Profile.

In the detail field, the CheckSubFields erases all line items that do not have valid data (after cleaning) in two of the three Qty, Linetotal, and Price fields. After cleaning, MCCOPositonAdjust adjusts the raw CCO coordinates for a multipage CCO into coordinates that are adjusted to the single page image from which the data originated.

The Lookup rule set

The lookup rule set populates the Vendor name in the vendor field. The Vendor name is stored in the fingerprint database. However, the rule set has an option for pulling the information from a locally stored database. An example is a mobile computer for development that might be disconnected from a network installation. Therefore, the rule set can still function in a disconnected state (Figure 11-20 on page 365).

Figure 11-20 Lookup rule set populating the Vendor field from the fingerprint database

The IsStationIDSuffix action in the Invoice library examines the station ID of the users in Taskmaster and can conditionally run one of two OpenConnection actions that open a connection to a database. By default, these actions are set to the same value. However, the first function can be altered if you are working on a machine that is sometimes disconcerted to the fingerprint database.

The Execute SQL action in the Vendor field checks the fingerprint database and retrieves the fingerprint classification. In Accounts Payable Capture, this classification provides the vendor name, but in Flex, this classification provides the document type.

The VendorNumLookup rule set

The VendorNumLookup rule set must be customized when Accounts Payable Capture is deployed at a customer site. It ships with an Access database so that the demos can work.

In an installed system, IBM Taskmaster Accounts Payable Capture gets the vendor numbers directly from the business application system or from a recent copy of the vendor database of the business application system. The vendor number is not assigned to just a vendor name, but also to a vendor location, because many vendors have different addresses for the business application system to send checks to.

You might want the vendor number to be looked up based on the PO number. This strategy is also successful if the business application system can supply such data through a lookup.

Because this rule set must be customized, and a readily available version is identical in structure to the Lookup rule set, we do not address this rule set further in this chapter.

The Vendor NumberLookupClose rule set

The Vendor NumberLookupClose rule set falls into the same category as the VendorNumLookup rule set. The Vendor NumberLookupClose rule set ensures that you have closed off the connection to the vendor database in the production system.

The POLR rule set

The purpose of the POLR rule set is to try to furnish line item numbers to each of the recognized line items in the invoice. Similar to the previous two rule sets, this rule set must be customized to pull the line item data from the business application system of the customer.

POLR uses settings from the settings.ini file to do the work. In most cases, just altering these settings is sufficient to allow POLR to work at a customer site. Figure 11-21 lists the pertinent settings from the settings.ini file.

Figure 11-21 POLR settings stored in the settings.ini file

The first section in Figure 11-21 is the [Database] section. Similar to the Lookup rule set, POLR can also use a test database when it is being used in a disconnected state. The settings must point to the vendor business application table that contains the PO information or the stored procedure that you use to retrieve the line items based on the current PO.

After the line items are successfully retrieved, POLR uses the ItemID, Qty, and Price keywords to determine which fields to use for automatch. In the previous example, lines automatch only if they have identical values for all three fields between the recognized line items and those retrieved from the business application system.

With manufacturing concerns, item prices are sometimes represented by a small fraction of the currency used. Therefore, a PriceTolerance is specified when automatching line items.

You might also want to output a list of any unused line items on a PO so that users can know that the invoice that they are paying does not close out a PO. For this reason, POLR also contains the capability to write the unused DCO lines to the DCO itself. Although the Accounts Payable Capture workflow does not immediately do anything with these line items, they are stored as variables on the Detail field of each Main_Page and can be exported if you want. Look at the detailed section in Figure 11-22 to see how unused lines from the PO are stored.

Figure 11-22 How POLR stores data in the runtime DCO

The actual PO values are stored for each line at the line-item level in variables. The PO Linenumber is essential to export so that a three-way match can be easily and programmatically accomplished. However, some systems use other criteria, such as the description, when matching the lines for a three-way match. If so, consider exporting the line criteria that allows the three-way matching system to function the best. POLR allows the system and data entry operators to reconcile the lines to ensure that errors are at a minimum. The goal is also to supply additional data to the three-way matching system. This way the knowledge workers assigned to that process can concentrate on actual issues (items not received, incorrect counts, different prices) and do not have to match the lines against the PO.

Although POLR runs in the Batch Profile phase, it cannot do its job if the PO is initially misread. This task profile prematches line items as possible before verification. The data entry operator has an advanced POLR user interface to quickly match any line items that could not be matched in the background process.

The Validate rule set

The validate rule set is used to clean and format data. It is also used to apply business rules to ensure that data is the correct length and data type and is mathematically correct. Validation also checks to ensure that the data meets other business criteria before it can be successfully exported.

The Validate rule set runs in several task profiles. When running in the Batch Profiler Task Profile on a background machine before verification, this rule set formats and checks each field before a data entry operator views the document. The rule set then marks the problem fields. This way, the data entry operator knows that they need attention to pass the business rules.

This same rule set also runs when the data entry operator has viewed a document, potentially made corrections, and has submitted it as complete. Having the business rules in this single code base saves a lot of development and maintenance time with Taskmaster systems.

The Validation rule set in an Accounts Payable Capture workflow also employs enhanced error messaging (see 10.10, “Enhanced error messaging” on page 332). With this feature, you can write your own complete error messages that appear during the verify process if an operator submits a form that fails validation.

The length of this rule set makes it difficult to explain it rule by rule. Therefore, this section shows how different technologies are implemented within it.

Figure 11-23 shows a snippet of the Validate rule set that shows the page-level rule and a simple field showing Enhanced Error Messaging.

Figure 11-23 Snippet of the Validate rule set

In the Validate Page rule, first the Status_Preserve_Off() places the rules engine in a state where it sets the Page and every field on the page to a status of 0. At this point in the process, a status of 0 means that the fields have passed the validation rules and are not a problem.

The way Status_Preserve_Off() works is that the page and fields are set to a status of 0 and then the validate rules on each field run. If a field fails validation, it gets the problem status of 1. If it gets this status, the page also gets a status of 1.

Pages with a status of 1 are displayed to a data entry operator, while pages with a status of 0 are not displayed. Fields with a status of 1 are displayed in a pink color during verification. This way the data entry operator can quickly see by the color of the field that it failed the business rule associated with it. More information about status is provided later in this section.

The ClearErrorMsg() action is part of the Advanced Error Messaging employed by Accounts Payable Capture. A page-level error message variable that contains text from every field that failed validation is displayed when a data entry operator submits a form that fails any business rules. This action clears that variable before running the field validations.

The CaptureOpInfo() action writes the current operator and station information to the DCO for export at a later time, if desired.

The Routing_Instructions field is in the Accounts Payable Capture workflow. Normally this field is set to None, but an operator can identify documents that they cannot process for some reason by choosing a predefined routing instruction. The rr_compare() action at the end of the first page-level function (Function1) detects whether this field contains a value such as Delete, Rescan, or Review. If it contains anything other than None, the function returns False, and the second function (Function2) indicates to skip the validation rules on the fields.

For a sample field validation, see the Invoice Number Rule in Figure 11-23. Each rule can be set to overridable or non-overridable. Non-overridable fields must be corrected by the data entry operator. Otherwise, they are unable to proceed in verifying a batch.

For this field, a minimum length of 2, with data that is at least 60% numeric, is required according to the first function, the Field Filled function. If those conditions are met, the rule is finished processing, and the status on the field remains 0.

If this field fails the conditions of the first function, the Invoice Number Rule checks whether the field has a maximum length of 0, meaning that it is blank, according to the second function, the Field empty function. If this field passes that function, the rule is finished, and the status on the field returns 0.

If the first two functions fail, the Invoice Number Rule calls the Error Msg function, which calls the AddToErrorMsg() action. This action sets the enhanced error message by writing to the page-level variable. Therefore, because every function failed on the validation, the status of the field is set to 1, meaning it is displayed in pink to the data entry operator. Also the page status is set to 1, meaning that the page is displayed to the data entry operator.

Almost every field in the Accounts Payable Capture workflow contains similar validations that are altered for the business rules we want to apply to each individual field. Some of the fields have additional technology applied to them, in particular localization actions and mathematical validations. Figure 11-24 on page 371 shows the application of localization technology and calculations.

Figure 11-24 LineTotal Rule for the Validate rule set

The first action examines data and normalizes it to the decimal separator used on the local machine. At the time of writing this Redbooks publication, Accounts Payable Capture supports both commas and periods as decimal separators. This action examines the data in the field and changes the decimal separator to the same decimal separator set on the machine processing the rules.

Next, the LineTotal Rule checks this separator so that it can properly clean the data of any currency marks and any thousands of separators that are present. Depending on the local decimal separator, different characters are allowed.

The LineTotal Rule then checks to ensure that the field is a currency field. Then it mathematically calculates the line item by multiplying the Qty by the Price and ensuring that it equals the line total, within the tolerance specified as a parameter. Having a tolerance in many applications is important because the document might limit the number of decimal places that are displayed to fewer than what is required for an exact equivalence match.

An interesting technique that you can sometimes employ follows the calculation. If the calculation is correct, the field values must also be correct, even if they contain low-confidence characters. In this application, the fields involved in the calculation are marked as high confidence because we know that they are correct based on the calculation.

The Routing rule set

In general, the Routing rule set is used to check the batch for low confidence characters and problem fields to determine whether the batch must go to a data entry operator. However, with the volume of data in an invoice batch, it is improbable that the data entry step can be skipped completely. Therefore, the Routing rule set is used to do final preparation of the batch before a data entry operator sees it. Figure 11-25 shows the Accounts Payable Capture implementation of the Routing rule set.

Figure 11-25 Routing rule set

The Batch rule uses a ProcessChildren action to capture the preverify value and position of each field and to store it as a field-level variable. The preverify position variable is used by Intellocate to determine if a field was zoned by the data entry operator. You can use the preverify value to capture statistics on the number of fields that a data entry operator changed, but it is not available immediately.

The page-level rule called Mark Pages can be shown to an operator. The first function checks whether the document has been deleted. Deletion of the document can only happen if a page in the document is not recognized. If the page is not recognized, the page is marked with a status of deleted.

All other Main_Pages are marked with a status of 1, meaning that the data entry operator sees every Main_Page in the batch. Most people prefer to see every page as a visual check that everything is working before exporting the data and the image.

If a document is new to the system, the Vendor field is populated with the value <New> during the Lookup rule set. The Clear field rule erases this text from the field and leaves it blank. It also increments a batch variable so that you can have a record of how many new fingerprints were in this batch.

The Invoice Type rule defaults the Invoice_Type field to a value of PO. The POLineItemRule copies the POLR variable for the PO Number to the text property of the appropriate field.

11.5.3 The Verification process

Several rule sets run under task profiles during the verification process. In verification, task profiles can be set up to run automatically or when a data entry operator clicks a button. This section highlights the following rule sets that can run under task profiles called by the verification process:

•The DynamicDetails rule set

•The CheckForSticky rule set

•The AutoCalc rule set

The DynamicDetails rule set

The DynamicDetails rule set is a way to find, at verify time, all of the line item fields in a document. You click the line-item subfields of the first detail line on the invoice and click the Find Details button. This rule set sets up the Lineitem and Detail zones automatically.

The DynamicDetails rule set runs under the Find Details Task Profile and is called by the FindDetails button on the Verify tab. Figure 11-26 shows the FindDetails Task Profile.

Figure 11-26 Find Details Task Profile

The DynamicDetails rule set (Figure 11-27) is identical to the rules that are associated with the Detail level and Lineitem rules in the Locate rule set. However, special actions are needed to read from a CCO that was loaded dynamically at verify time. The actions function identically to their non-dynamic counterparts.

Figure 11-27 DynamicDetails rule set

The Clean and Filter rule sets called in the Task Profile are the same rule sets that are called in the Batch Profiler Task Profile.

The CheckForSticky rule set

The CheckForSticky rule set runs in the StickyFingerprints Task profile as shown in Figure 11-28. As explained previously in this chapter, you must be familiar with the other rule sets that are called in this task profile.

Figure 11-28 The StickyFingerprint Task Profile

The StickyFingerprint Task Profile runs automatically from a thick client when it detects that a new fine-grained is being processed and another of the same new fingerprint was processed previously in the same batch. For example, a vendor sends in two invoices with a new format. The first one creates a fingerprint, and the second one matches that same new fingerprint. At verify time, the operator zones the first invoice. When the second invoice is displayed, the StickyFingerprint rule set copies and adjusts the zones from the first invoice to the second invoice. Then it automatically populates with data.

If you are reading this chapter from the beginning, you are familiar with all of the rule sets called by StickyFingerprint, except for the CheckForSticky rule set (Figure 11-29).

Figure 11-29 CheckForSticky rule set

When you first enter this rule set, the Sticky variable must be blank. The first two functions fail, and the CheckForSticky action runs. This action checks whether previous documents were in the batch that can be used to zone the current document. If there are such documents, this action adjusts and copies the zones to the new document. It also sets the Sticky variable to Yes or No depending on what it detected when it analyzed the batch. If the variable is yes, the other rule sets in the Task Profile run, and the data is populated. If the variable is set to no, the other rule sets are coded to do nothing.

The first two functions of this rule are in place if Sticky is run for a second time for some reason. If the variable coming into this rule set is already set to Yes or No instead of blank, the CheckForSticky rule does not run.

The AutoCalc rule set

The AutoCalc rule set (Figure 11-30) runs in the CalculateBlank Task Profile. It is called when the verify operator clicks a button. Because the Qty, Price, and LineTotal fields are mathematically related, this rule set enters a single missing value if it is detected on a line item.

Figure 11-30 The CalculateBlank Task Profile showing the AutoCalc rule set

The CalculateBank Task Profile calls a single rule set to do the analysis and automatic correction of a blank Qty, Price, or LineTotal field on a Lineitem as shown in Figure 11-31.

Figure 11-31 The AutoCalc rule set

The Character allowed Rule is applied to each field to ensure that the values in them are normalized to the proper localization. Then, the DetailFix Rule is applied to calculate a single missing value on each line item.

11.5.4 The Export Task Profile

The Export Task Profile consists of five rule sets that are ready for immediate use. However, it does not contain rule sets to export the data to your imaging system or to your business application system. Because these rule sets are highly variable, the demo writes the data to an XML file. You must add additional rule sets to this task profile for a production installation.

Not all of the rule sets in this Task Profile are used to export data. Many of the rules are for preparing the data for an export. Other rules are for handling the disposition of problem documents, such as those documents that must be reviewed or rescanned. The Task Profile also includes the Intellocate rule and a rule to capture fingerprint statistics.

The Export Task Profile includes the following rule sets as shown in Figure 11-32:

•The SetStatuses rule set

•The PreExport rule set

•The Export rule set

•The ExportClose rule set

•The RoutingNotification rule set

Figure 11-32 Export Task Profile

The SetStatuses rule set

Two systems are available for marking documents for deletion, rescan, and review. The thick client Verify task does this task with user keystrokes and sets statuses on the documents and the pages for you. The thin client verify panels rely on a drop-down list in n the field Routing_Instructions to mark documents for the same issues. The SetStatuses rule set (Figure 11-33 on page 378) consolidates the two methods. This way, at export time, you only have to check the statuses or the Routing_Instruction field to determine if you want to export them or send a notification to someone.

Figure 11-33 The SetStatuses rule set

The first rule, Check Routing_Instructions, runs on the Routing_Instructions field. It looks at any value other than the default value of None and sets the statuses on the page and document accordingly.

The second rule runs on the page level and checks the current page status that a thick verify client might have set. If the current page status is set, the rule sets the Routing_Instructions field to the appropriate value.

Because of this rule set, you only need to check one of the methods for marking documents for special handling with the rest of the Export rule set.

The PreExport rule set

The PreExport rule set is a catch-all for everything that needs to be done before exporting the documents and images. It is also where you can find Intellocate, one of the most important features of DNA applications.

Figure 11-34 shows the non-Intellocate rules that are associated with the PreExport rule set.

Figure 11-34 Non-Intellocate rules in the PreExport rule set

The document-level rule uses the Scansoft recognition engine to make a text searchable PDF at the document level. This PDF is displayed in the batch directory and is named with the DocumentID property and a .pdf extension.

The empty currency rule attaches to each field that contains currency and defaults the value to 0.00 if the field is blank. This rule might need to be changed to 0,00 if a comma is used as a decimal value in the system to which you are exporting.

The Prep Vendor Field for XML Syntax checks the vendor field for an apostrophe (‘) or an ampersand (&) and replaces those characters with the XML equivalents to those characters. The PO Lineitem rule ensures that the PO LineNumber field is populated with the data from a POLR lookup.

The Intellocate rule set makes up the bulk of the processing in the PreExport Task profile. Figure 11-35 shows the first part of the rule set, Page Rule - Intellocate rule. The rule runs on the Main_Page object.

Figure 11-35 First part of the Intellocate rule set

The first two functions control what happens if a data entry operator defines to the system the need for a new fingerprint based on the current image. Normally this action is only done if a fingerprint mismatched. However, sometimes data entry operators mistakenly choose to add a fingerprint when the fingerprint is already in the <New> classification, which is created anyway.

For the first function to complete successfully, the operator must choose NewFingerprint when it is presented to them. The fingerprint is in <New> to correct a misunderstanding that some operators have. If the fingerprint is in <New>, Intellocate saves the zones for the fingerprint. We do not want create an additional new fingerprint. If this happens, the SetFingerprint action classifies the fingerprint (moves it out of the <New> classification) into a classification that matches the Vendor name. The iloc_SetZones sets the header fields, and the iloc_SetDetailsSimple sets the zones of the detail fields.

Important: These actions write the zones to the Setup DCO. If you want to use FPXML, unhook this rule from the page and replace it with the unattached rule in this rule set for FPXML.

In its essence, Intellocate is done with three actions: SetFingerprint, which moves the fingerprint out of <New> and classifies it, and the two iloc actions that save the zones. The rest of these actions ensure that these actions need to be done.

The second function runs if the data entry operator clicked the NewFingerprint button correctly this time because the document matched an existing fingerprint erroneously. When this happens, we want to dynamically create a fingerprint in the library from the existing image, run Intellocate on it to classify it, and save the zones.

Because the page might have a multi-CCO (MCCO), we re-recognize the page to ensure that it creates a single page CCO for the first page. We also indicate where to store the new fingerprint that it creates. We use CreateFingerprint to create the fingerprint before we use Intellocate.

The DCO Status 75 function handles deleted documents. If an operator indicates that a document does not belong in the system, we want to delete any fingerprint that it created. This function can return a value of false if the fingerprint was somehow already deleted. Therefore, the trailing function checks this value and returns true so that the rule does not continue.

The same process is done for rotated documents. You do not want to store upside down fingerprints. Therefore, this function deletes them.

If a fingerprint remains in <New> after these functions have checked for special processing, the New Fingerprint function runs (Figure 11-36).

Figure 11-36 Remaining part of the Intellocate rule

As you can see, this process takes any fingerprint that is still in <New>, automatically classifies it, and saves the zones.

If a fingerprint is not in <New>, the LearnZones action examines the PreVerify Position variable that was created in the Routing rule set and the current position of the field in the runtime DCO. If the PreVerify Position was 0,0,0,0 and the operator provided a zone for the data, the position of that field is added to the other fingerprint positions.

The Export rule set

When you demo Accounts Payable Capture, you might not have access to a business application system or an imaging system. Therefore, for demo purposes, we write out a standard text file with XML tags. Because this rule set is not commonly used for production, it is not explained here. This file writes the text searchable PDF documents and data in XML format in the APT/Export directory.

The ExportClose rule set

The ExportClose rule set writes the XML tags to complete the XML file and closes the file. This rule set is not used in production.

However, you want to put the SetFPStats() action in the ExportClose rule set in your custom export. This action counts the number of times and the most recent time that a particular fingerprint was used. It helps you to manage the fingerprint library to remove fingerprints that are no longer being used by any of the vendors (Figure 11-37 on page 383).

Figure 11-37 UpdateFPStats action that must be in your export

The RoutingNotification rule set

The RoutingNotification rule set is also altered for production. This rule set must be in place if you want to notify someone that the batch had documents in it that could not be processed (Deleted, Rescan, or Review).

Figure 11-38 shows the RoutingNotification rule set.

Figure 11-38 RoutingNotification rule set

The first function checks the Routing_Instructions page. If it is not set to None, it fails. The failed document is then processed by SendOutLookNotification that sends an email to a person specified in the settings.ini file and attached the multipaged PDF file.

The reason that this action is not used in production is that it requires Microsoft Outlook to be set up on the background machines. Also the version of Microsoft Outlook must support sending email messages programmatically, but not versions do. Therefore, replace this action with another email action to send the documents. Alternatively, develop another method of informing someone of these problem documents that is an entry in a database.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11. Technical walkthrough Accounts Payable Capture

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 11. Technical walkthrough Accounts Payable Capture