Chapter 4
How to Build a Fraud Data Analytics Plan

Continuing with the house analogy, we will now create the blueprint for your house based on the building code from Chapter 3 and the fraud scenarios identified from Chapter 2. In this chapter, we will design each room of the house based on the fraud scenarios identified in your fraud risk assessment. Hopefully, we will avoid change orders, although I believe the nature of fraud data analytics is an evolving process.

In this chapter, we will discuss the methodology for building a fraud data analytics plan. There are eight stages to building the plan. Initially, you may feel the process is bureaucratic or redundant. In some ways the reader is right. However, it is critical to ask as many questions as possible before creating the data interrogation routine. Otherwise, the plan may result in either excessive false positives or, worse yet, false negatives (a missed fraudulent transaction). In time, the process of developing a fraud data analytics plan will become intuitive.

Asking the right questions, in the right order, is critical from a logic perspective. However, for my free thinkers, I offer the list not to constrain you but to give you a checklist of questions you need to consider. Within each step there will be many considerations. The decisions should be understood and documented as part of the workpaper process. So, what are the steps or questions to building a fraud data analytic plan?

  1. What is the scope of the fraud data analysis plan?
  2. How will the fraud risk assessment impact the fraud data analytics plan?
  3. Which data‐mining strategy is appropriate for the scope of the fraud audit?
  4. What decisions will the plan need to make regarding the availability, reliability, and usability of the data?
  5. Do you understand the data?
  6. What are the steps to designing a fraud data analytics search routine?
  7. What filtering techniques are necessary to refine the sample selection process?
  8. What is the basis of the sample selection?
  9. What is the plan for resolving false positives?
  10. What is the design of the fraud audit test for the selected sample?

To illustrate the use of the questions, you should imagine you are assigned to an audit. You are required to implement a fraud data analytics plan. This is the first time your department will use fraud data analytics. The scope of the assignment is the expenditure cycle. So, what are the answers to the 10 questions? Later chapters will discuss the actual routines for procurement and disbursement fraud.

Plan Question One: What Is the Scope of the Fraud Data Analysis Plan?

The starting point of the plan is to understand what is and what is not included in the scope of the project. The concept of searching for fraud is too broad. It provides no realistic boundaries or sense of direction. The auditor without a roadmap will be found wandering around the desert without any hope. To be clear, you can search for the entire fraud risk structure in one audit, assuming you have a roadmap for all the fraud scenarios.

The starting point is the purpose of the assignment. Are we performing an audit or an investigation? In an investigation, the project scope is determined by the allegation or the legal action. In an audit of financial statements, the scope is determined by Generally Accepted Auditing Standards. For internal audits, the scope of the fraud analysis needs to be determined by the chief internal auditor. Regardless of the scope, the fraud risk structure is the basis for defining the beginning and ending points for the fraud data analysis project.

The fraud risk structure in Chapter 2 discussed the concept of a fraud risk structure as a basis of defining the project scope. The audit team needs to define the parameters of the audit scope by starting with the primary fraud categories of financial reporting, asset misappropriation, or corruption. The second step is to understand which secondary types of fraud categories are included within the scope.

The time period of the fraud data analysis is the next step. Will the analysis include one, two, three, four, or more than four years of data? I prefer the fraud data analysis to use full‐year data sets and avoid partial‐year data sets. The exception to that rule is investigations or specific audit tests. In financial statements audits, there may be a need to include the next year's data up to the opinion date or use last year's data for retrospective analysis. The next step is to identify the primary table and secondary tables. The scope establishes the date parameters for the primary table; the time parameters for the secondary tables are driven by your desire to match transactions.

To illustrate the concept, your audit scope is the expenditure cycle for the year 2015. In order to match the vendor invoice table to the purchase order table, the data analytics will need to select both the 2015 purchase orders and purchases orders prior to 2015. As a practical tip, identify the lowest‐number purchase order number in the expenditure table, then go to the online system and find the date of the lowest purchase order number. The payment table will need to go beyond 2015 to find the date when all 2015 vendor invoices were paid.

The last step is to identify the primary table and the secondary tables. Databases are composed of a series of tables that link together based on a common identifier. In fraud data analytics, the fraud scenario defines the primary table and the associated tables. In false entities, the primary table is the table with entity information: name and address, and so on. In real entity scenarios, the primary table is the transactions associated with the fraud action. To illustrate the table concept in a speed‐of‐payment analysis, the vendor invoice table is the primary table and the payment table is the secondary table. The vendor number will be the common identifier linking the two tables followed by the invoice number and purchase order number.

Scope Concept for the Corruption Project

The following is an illustration of the expenditure project, which we will call the Corruption Project. The audit project is the fraud scenarios that reside in the expenditure cycle. For practical purposes, I divide the cycle in half. The first half is the procurement and the second half is the payment. As a soft guideline, in the purchasing process the fraud auditor should focus on corruption fraud in the procurement cycle and for the payment process the fraud auditor should focus on asset misappropriation. Within the scope of the illustration, the fraud auditor has decided to focus on corruption in the procurement process. Therefore, the purchase order file is the primary table.

Plan Question Two: How Will the Fraud Risk Assessment Impact the Fraud Data Analytics Plan?

The purpose of the fraud risk assessment is to identify all the relevant fraud scenarios within the audit scope and determine the extent of residual risk. The timing and extent of the audit procedures and determining residual risk is beyond the scope of this book. Within the scope of this book, the fraud risk assessment provides the audit team with a comprehensive listing of fraud scenarios for the fraud data analytics plan.

The decision to use fraud data analytics to search for evidence of a fraud scenario is based on auditor judgment. Obviously, a fraud scenario with a high‐residual risk rating should be a candidate for fraud data analytics. The auditor's judgment could be based on fraud scenarios that have occurred in the past, fraud scenarios that commonly occur in the industry, or simply a focal point of the chief auditor. The fraud data analytics plan should have a direct cross‐reference statement to the fraud risk statement.

Continued Illustration of the Corruption Project

The fraud risk assessment has indicated two corruption fraud scenarios that have a high‐residual risk. Corruption by definition has two or more parties. One party is committing the required fraud action, and the other party is conspiring with the primary party. In the first scenario, the internal party is corrupting the bidding process by issuing a series of small‐dollar purchase orders throughout the year to avoid the bid process. The supplier in collusion with the internal person inflates pricing and pays the internal source a bribe. In the second scenario, a supplier is intentionally operating under different names, which facilitates the internal person to split the purchase to what appears to be different vendors, thereby avoiding the bidding process. The vendor inflates pricing and pays a bribe to the internal person.

Plan Question Three: Which Data‐Mining Strategy Is Appropriate for the Scope of the Fraud Audit?

This is like the expression, Which comes first, the chicken or the egg? The fraud data analytics strategy is connected to the fraud concealment, but the auditor needs to determine what level of concealment will be considered within the scope of the audit. At a minimum, audits mostly consider low sophistication up to medium sophistication. In an investigation, the team should consider low to high sophistication.

In Chapter 3 guidelines are provided for using each strategy. The fraud auditor needs to apply the guidelines in the selection of the right strategy for the fraud data analytics plan. One strategy could be used throughout the audit, or the strategy could be on a fraud scenario basis. What is most important to understand is the purpose of each strategy—how to use the strategy and how the strategies correlate to the fraud concealment sophistication theory.

Continued Illustration of Corruption Project

Split Purchase Orders Fraud Scenario

  • Internal person is in collusion with a supplier, the internal person issues purchase orders below the control threshold to the supplier and inflates pricing on the item, the supplier then pays the internal person a kickback, resulting in the corruption of the purchasing process.

The fraud scenario is composed of a real entity that is complicit in the scenario and the fraud action is issuing multiple purchase orders to ensure no purchase order triggers the bidding requirements. The fraud data analytics strategy is the internal control avoidance. Therefore, the fraud data analytics plan focuses on the fraud action versus the entity. The first analysis would summarize purchase orders by vendor using stratification by dollar level associated with bidding levels. Vendors with no small‐dollar purchases and only large‐dollar purchase orders that required bidding procedures would be excluded from further analysis. The second analysis would summarize the remaining purchase orders by line item. The second report would provide a frequency count, aggregate dollars, and average purchase order. The sample selection is based on vendors with annual purchases that exceed the dollar level for bidding or a vendor with a high frequency of purchase orders and a high average purchase order amount. To further refine the process for vendors meeting the first selection process, the analysis would focus on line items on the purchase order. Remember, we are continually shrinking the haystack.

Hidden Entity Fraud Scenario

  • Internal person in collusion with a supplier, the supplier is in the vendor file under different names with different identifying information, purchase orders are intentionally spread to the different vendor names to avoid bidding requirements, and the supplier pays the internal person a kickback, resulting in the corruption of the purchasing process.

The inherent scheme is composed of a false entity, referred to as the hidden entity, and the fraud action is issuing small‐dollar purchase orders to multiple vendors. The fraud data analytics plan will focus on the false entity versus the transaction. The search for hidden identities will use specific identification strategy focusing on duplicate testing on address, telephone, bank account, email address, and contact person. The testing will start with exact match and consider close match depending on the initial results. If the duplicate data interrogation routine identifies entities, then the transactions for those specific entities will be linked for analysis.

The second report would provide a frequency count, aggregate dollars, and average purchase order. The second scenario is very similar to the first scenario. The difference between the two fraud scenarios is the entity structure, not the fraud action statement. Once you realize the similarity and differences between all the fraud scenarios in your scope, the idea of building a comprehensive fraud data analytics plan will not be overwhelming.

Plan Question Four: What Decisions Will the Plan Need to Make Regarding the Availability, Reliability, and Usability of the Data?

The purpose of assessing the availability and reliability of the data is to determine if the data are usable for fraud data analytics. The availability focuses on the completeness of the data, whereas reliability focuses on the overt accuracy of the data for the planned tests. The word overt is used with intent. Most data are derived from a document. The reliability test cannot determine if the data are entered correctly, but rather, searches for an overt data error. To illustrate, the scope period is 2014, but the vendor invoice date in the database is 1925. Clearly, 1925 is not the correct date on the vendor invoice.

The reliability of data depends on the planned tests. As in this project, the fraud auditor is searching for circumvention of bid levels. So, the invoice date is less critical. If the scope is searching for a duplicate payment scenario, then the date field is critical to the plan.

The completeness test is simple because the fraud data analytics test counts the number of blanks or quasi‐blanks (i.e., a dash versus a blank) in the fields to be used for testing. The tough part of the analysis is to determine what impact blank data fields will have on the success of our planned fraud data analytics testing. At a minimum, the fraud data analytics workpapers and final audit report should provide the reader with the relevant statistics as to the total number of the items and the total number of blanks in each field.

The outcome of the completeness analysis will depend on the size of your database and the degree of error. If the company has 500 employees, and three employees are missing telephone numbers, the auditor could perform manual research procedures to identify the telephone numbers. If population is 100,000 active employees, with 10 percent of the telephone fields being blank, then a manual research procedure would consume an extensive amount of time.

As a reminder, fraud data analytics always reveals false positives. False positives occur from the design of the test or data integrity issues. The key strategy question is how much effort we should expend in data analytics to eliminate overt false positives resulting from data integrity or whether we should allow the auditor to resolve the false positives through audit procedures. The false positive question is a tough question, but an important one. If you do not consider these questions before data interrogation, you will need to address the questions later in the audit.

Entity Availability and Reliability

The entity availability analysis should determine what percent of each entity data field is populated. In small databases, an easy test is to sort on each data column that is included in the test. Then count the number of fields that are blank or contain a dash. In large databases use the count feature to determine the number of blanks or dashes in each column. The important step is to determine the availability of data for planned tests. The reliability testing for the entity data does not lend itself to fraud data analytics.

The purpose of the usability analysis is to determine what impact the lack of data integrity has on planned tests. For shell companies, missing data become an audit trigger. In hidden entity testing, which uses a duplicate test, missing data reduces the effectiveness of the test.

Transaction Availability and Reliability

The availability analysis for transactions is designed to ensure the completeness of the data. In Chapter 3 we discussed that most business or computer transactions typically contain a control number, date, amount, and description. The first step is to ensure that the critical fields are populated. Similar to the entity availability analysis, the workpapers should document the degree of error.

The second part of the availability analysis is matching the various tables to ensure that we have completed transactions. To illustrate, in the expenditure cycle we have a purchase order, a vendor invoice, and a payment transaction. In the availability analysis we need to ensure that we have purchase order, invoice, and payment information for the transaction. There may always be a few transactions that are missing purchase order or the payment transaction due to timing and aging reasons. However, prior to data interrogation we should know how many uncompleted transactions are within our audit population.

The reliability test for transaction is critical for fraud data analytics. Control numbers originating from source documents are not typically verified; therefore it is my experience that a control number has a higher degree of error than an amount field. Data entry errors that are typically associated with a control number are: adding numeric integers, data entry operators entering a portion of a long number, documents that do not have a control number, substituting a different control number, and an error in entering the number. Although these types of errors usually have no impact on the business process, the error can create false positives associated with fraud data analytics testing which uses the control number.

Date errors create problems with sequence testing or speed‐of‐processing tests. A simple test is to ensure the year in the date field is consistent with the scope of the audit.

Amounts are typically correct although reversal transactions may provide the illusion of duplicate transactions or inflated record counts. In the data‐cleaning phase, we can search for reversal transactions using exact reversal search techniques to mitigate false positives associated with the amount field.

The description field is critical for most transaction analysis. When the description field is populated based on internal systems, the field is usually reliable. Examples of internal systems are sales systems that populate a sales invoice from a product description file or the payroll system that populates earning type from an earnings code table. In these systems, the analysis should search for codes that are not consistent with the known codes in the tables.

Description fields that are created from a manual entry process or description fields created from a vendor document or customer document may have a high degree of error or inconsistency as to the information contained in the database.

The outcome of the availability and reliability analysis is to determine the usability of the data for the planned tests. The fraud data analytics project should try to anticipate the type and frequency of errors that will occur in your fraud data analytics plan.

The Usability Analysis

The usability analysis is a byproduct of the availability and reliability analysis. The purpose of the usability analysis is twofold. First, does the data have sufficient integrity to ensure the fraud data analytics will provide a meaningful sample? Second, provide a conclusion on how to go forward with the sample results. There are four usability conclusions from the availability and reliability analysis:

  1. Should the fraud data analytics plan be postponed until a later date until management improves the internal controls over data entry or enforces adherence to existing internal controls?
  2. The transactions containing obvious data errors will be extracted from the specific test to eliminate false positives originating from overt data integrity.
  3. The degree of error is acceptable and will have minimal impact on the success of the test.
  4. The degree of error may create false positives; however, the fraud auditor can resolve the false positives in the audit test phase of the fraud audit.

Second, assuming the data are deemed usable, the next step is to clean the data consistent with the usability conclusions. The cleaning process includes data formatting and excluding overt data integrity items. A word of caution: I have seen and heard of many interesting data‐cleaning techniques, and while many of these techniques are quite clever, the cleaning technique does change the original data, which could have an impact on your sample selection or create false positives. One step to avoiding errors is to include the original data next to the clean data so visual examination process is able to detect false positives created from the data cleaning.

Continued Illustration of Corruption Project

What availability and reliability issues should the fraud auditor anticipate?

  • How many vendor invoices do not have a purchase order?
  • How reliable is the line item description on purchase orders and vendor invoices?
  • How many entities are missing data that would be used in the hidden entity analysis?
  • What impact will the usability question have on our project?

Plan Question Five: Do You Understand the Data?

The word anomaly is defined as an extreme deviation from the norm. The understood data question is all about understanding what is the norm of the data. In each later chapter, we will discuss the types of planning reports.

The goal in this stage is to understand both gross numbers and transaction type numbers. To illustrate gross numbers in payroll, how many employees are on the database? How many are active? How many are inactive? To illustrate transaction type numbers, how many are paid via direct deposit versus paid with a check? How many are salaried employees versus hourly paid employees?

This stage should create statistical reports summarizing transactions by entity number, by control levels, by transaction types, or by internal codes that are relevant to the business system and planned audit tests. The reports should provide aggregate dollar level, number of records, maximum dollar, minimum dollar, and average dollar.

The auditor should study these reports to understand the norm of the population and the various subgroups created through internal codes before creating and designing fraud data analytic routines.

Continued Illustration of Corruption Project

The purchase order is the key document followed by the vendor invoice. The first report is a summary of purchase order issuance by vendor number. The report should provide the frequency of purchase order, aggregate dollar value, and the maximum, minimum, and average purchase order. This type of report is always my first type of summary report. This report will point me to the split purchase transaction through data interpretation.

A second report is a summary by line items on purchase orders in the scope period, providing the same information as the first report. An enhancement of the report would be by line item by vendor. This report would point you directly to the hidden entity.

A third report would be the comparison of the purchase order date to the invoice date. The purpose of this report is to determine the frequency of budget owners that procure the item first and then submit the invoice for payment. I would expect to find a difference in purchases associated with inventory and administrative purchases.

As a matter of style, I prefer to start with the same report structure for all projects and all companies. It provides a common baseline for me in fraud data analytic projects. In addition, it avoids being overloaded with lines of data. Once I see the data pattern, I refine the report based on my first review. Sometimes, I may be refining the report within minutes of seeing the data. Once again, my personal style is to start high level and continually refine the data. Sometimes it takes several iterations of drilling down to create the report that points to the fraud scenario.

Plan Question Six: What Are the Steps to Designing a Fraud Data Analytics Search Routine?

The following eight steps are necessary to build the data interrogation routines:

  1. Identify the components of the fraud scenario—the person committing, the type of entity, and the action statement.
  2. Identify the data that relates to the fraud scenario.
  3. Select the strategy consistent with the scope of the audit, sophistication of concealment, degree of accuracy (exact, close, or related), and the nature of the test.
  4. Based on the data availability, reliability, and usability of the data, clean the data set for overt errors.
  5. Identify the logical errors that will occur with the test.
  6. Create your homogeneous data sets using the inclusion and exclusion theory.
  7. Establish the primary selection criteria, followed by the remaining selection criteria.
  8. Create the test using the programing routines to identify all entities or transactions that meet the testing criteria.

The following provides guidance in using the fraud data analytics questions for developing your data interrogation routines.

Step 6.1: Identify the Fraud Scenario

The person committing the scenario identifies whether the fraud data analysis is searching for direct access, indirect access, or override capacity to the database. From a data function, there are two fundamental issues. The person committing the fraud scenario needs the ability to record a transaction and a location to record the transaction. If the person has direct access, then the summary will focus on the record creator; when the person has indirect access, the summary will focus on the location. Remember, this is about data analysis versus internal controls.

The direct access is generally referred to as the input function. The person performing the input function can create, change, delete, or void a transaction. In the simplest of environments, the input function is vested with one individual, so the analysis is easy. However, in most large companies, many individuals will have input capacity. Therefore, the data analysis will need to summarize transactions by creator. What the input function does not have is a budget to record the fraudulent transactions. Therefore, the fraud auditor will need to consider logical locations where the fraudulent charge could be recorded without detection.

The indirect access causes the direct access function to update the record based on the budget owner's authorized action. The authorization action is either valid or a forgery. However, the advantage the budget owner has over the direct input function is that the budget owner has a home to record the transaction. The summary of transactions must occur through the budget owner code versus the direct input function.

The override capacity has the best of all worlds in one sense. They have the ability to cause the transactions to be recorded in the database and they have several budgets to record the transaction.

As a reminder, entity analysis is for false entity scenarios, whereas real entity scenarios tend to focus on the transactions file. The exception is when searching for real entities operating under different names, referred to as hidden entities. In theory, entity analysis functions as a form of cluster analysis. The goal is to create three categories of entity structures: false entity, conflict of interest entity, and real entities.

The fraudulent action requires the linkage of the entity to the transactions meeting the profile. In the corporation illustration, the purchase order file and the invoice file are the basis for our fraud data analytics.

Step 6.2: Identify the Data That Relates to the Scenario

In one sense, this is the most important aspect of the plan because fraud data analytics is all about interrogating data. However, in another way, this stage is the most predictable.

In the Appendix at the end of this chapter, there is an example of a generic data request form for an expenditure fraud data analytics plan. In reviewing the list, the reader should see that the data are rather obvious. We need purchase order information, invoice information, receipt information, and payment information. As a starting point, the plan will need control number, date, amount, description, and the general ledger account number.

Using the exhibit, review the online systems to identify the name of the data on the screen. Compare the generic list to the data on the screen to determine additional data that would help in the fraud data analytics plan. Once the list is created, the auditor will need to identify where the data reside in the database. A third column should be created, cross‐referencing the table and column name of the data field.

Step 6.3: Select the Fraud Data Analytics Strategy

The correct strategy has several considerations. The starting point is the objective, the scope of the audit, and the fraud scenarios included in the scope. The next consideration is the level of sophistication of concealment, whether the plan will search for low, medium, or high sophistication. The concealment decision will be a key factor in the selected data‐mining strategy, or how, when, and where to use the strategy:

  • The specific identification strategy is a good place to start because of the ease of use.
  • The control avoidance strategy should be used when the fraud scenario is based on a direct input function. For indirect access the strategy becomes an inference analysis.
  • The data interpretation strategy should be used to search for high‐ concealment or real entity scenarios.
  • The number anomaly strategy is the easiest way to search for even numbers and recurring number patterns.

Step 6.4: Clean the Data Set: Data Availability, Data Reliability, and Data Usability

The procedures for performing this step are somewhat dependent on the software used by the fraud auditor and how the data are extracted from the database. The first step is always the availability of data; the second step is the reliability of data. The last step is to prepare the data for the interrogation routine. To illustrate, if the report will contain a date range, what format is the date range, as to month, day, and year? Depending on the country, the date field will have a different format. The second question is the field alpha numeric format or numeric format. Many of the steps are housekeeping procedures, but without consideration, the data interrogation process will be fraught with errors.

Step 6.5: Identify Logical Errors

Errors in this stage are defined by data that will cause a false positive. Data integrity errors were discussed in the availability and reliability section. Three main sources of data errors are caused by the way external parties create their documents, how the input function enters the data, and what I call true anomalies.

External parties, customers, and vendors create documents that contain information that is entered into the database. An example is a vendor invoice number. Some vendors use alpha in their invoice numbers. In computing an invoice range report, the alpha in the invoice number will create an error in the calculation. In searching for duplicate addresses, a false positive is created because two different vendors occupied the same space at different times (a true anomaly).

Fortunately or unfortunately input functions often focus on getting the transaction processed through the system, versus after‐the‐fact fraud data analytics. The way an input function enters a transaction may have no impact on proper reporting of the financial statements or cause an improper payment to a vendor; however, the way the data are entered will create false positives.

To illustrate how an input function creates false positives, a vendor invoice may have no invoice number. However, the accounts payable system requires a vendor invoice number. Therefore, the input function may create a unique number or use a date for an invoice number. In one project, operations staff would frequently submit an invoice without a purchase order. The accounts payable function, instead of creating a line item purchase order to support the invoice and provide the actual items purchased, would create a purchase order with a quantity of one; the unit price was the total of the invoice and the description was the invoice number.

True anomalies are a byproduct of an ever‐changing society. Telephone numbers are either a mobile number or a land line. The Telephone Portability Act allows a person to retain their telephone as they change geographic location. Therefore, there is no correlation between area code and physical location. In payroll, the search is for duplicate bank accounts under different names. Married couples may keep their own name, causing a false positive with the duplicate bank account number test for different last name.

Data are not perfect. There are anomalies caused by many factors; the goal of this stage is to anticipate the types of errors that will occur. The plan should either determine if the false positives can be minimized through the data interrogation routine or whether the auditor will need to resolve the false positive through document examination. Lastly, the report should offer recommendations on improving the quality of data for management monitoring processes.

Step 6.6: Create the Homogeneous Data Files Using the Inclusion and Exclusion Theory

The inclusion/exclusion theory is a critical step in building the fraud data analytics plan. The inclusion is the data that are consistent with the fraud data profile and the exclusion is the data that are not consistent with the fraud data profile. The theory is consistent with shrinking the haystack. Whether or not the fraud auditor actually creates separate files is a matter of style, whereas the concept of inclusion/exclusion is necessary in identifying anomalies.

The reason for this step is the size of company data files and has nothing to do with hardware storage or speed of processing. It has to do with understanding what data relate to the fraud scenario and what data do not relate to the scenario. First, it is a mental exercise in defining the scope of the fraud data interrogation routine. Second, it is about identifying an anomaly. Whether or not the fraud auditor creates separate files is a matter of style. Yes, I emphasize this because style is dependent on the auditor. However, without going through the mental exercise, the fraud auditor is bound to increase the number of false positives or miss the anomaly.

The importance of the inclusion and exclusion step varies by the nature of the inherent fraud scheme, the fraud data analytics strategy, and the size of the data file.

Let's assume the vendor master file has 50,000 vendors; 5,000 vendors are inactive. The first homogeneous data set would be only active vendors. The fraud scenario is a shell company created by an internal source. The data interrogation procedure focuses on missing data as the primary selection criteria. This test identifies 100 vendors meeting the search criteria.

The transaction file contains a million vendor invoices. Should we test all million invoices for shell company attributes, or only those invoices that meet the shell company missing criteria test? The inclusion theory would only select those transactions for the 100 vendors identified in the missing analysis.

The inclusion and exclusion theory is a critical thought process in the building of fraud data analytics. By ignoring the thought process, the size of the reports becomes an obstacle in identifying the fraud scenario that reduces the effectiveness of the fraud data analytics methodology.

Step 6.7: Build the Fraud Data Analytics Test through Identifying the Selection Criteria

In the selection criteria, there are two fundamental strategies. The first is to identify all entities or transactions that meet the criteria. The purpose of the test is to exclude all data that do not meet the criteria. Since that test operates on one set of criteria, the sample population tends to be large, although much smaller than the total population. The auditor then can use either a random selection or auditor judgment on selecting the sample. The advantage is that the auditor has improved the odds of selecting a fraudulent transaction.

The second strategy is to select all data that meet the testing criteria, referred to as the fraud data profile. The selected strategy is a key criterion on selecting the sample:

  • Specific identification. The sample should be the transactions that meet the criteria.
  • Control avoidance. The sample should be the transactions that circumvent the internal control.
  • Data interpretation. The sample is based on the auditor's judgment.
  • Number anomaly. The sample is based on the number anomaly identified and auditor judgment.

So, what is the difference between the two strategies? The first strategy uses an exclusion theory to reduce the population, whereas the second strategy uses an inclusion theory as a basis for sample selection. Remember, after identifying all transactions meeting the criteria, data filtering can be used to shrink the population.

Step 6.8: Programming Routines to Identify the Selection Criteria

It is interesting to see how different individuals program the software to create the data interrogation routines. Since programming is software dependent, I offer the following strategies to avoid faulty logic in the design of the search routine:

  • Flowchart the decision process prior to writing the search routine. The order of the searching criteria will impact the sample selection process.
  • Create record counts of excluded data and then reconcile the new control count to the calculated control count. It is easy to reverse the selection criteria, thereby excluding what should have been included. The reconciliation process helps avoids this error.
  • Perform a visual review of the output. Ask yourself, does the result seem consistent with your expectations?
  • Create reports that can function as a workpaper. Remember, there needs to be sufficient information to locate the documents. Reports with too many columns are difficult to read on the screen and are difficult to read in a printed format.

Plan Question Seven: What Filtering Techniques Are Necessary to Refine the Sample Selection Process?

At this stage, the data interrogation routine has identified entities or transactions that meet the criteria of the test. Filtering of the identified transactions can occur through the software or through manual observation.

Filtering and creating homogeneous data files may sound like the same process but in fact they are very different. Creating the homogeneous data file is the process of normalizing the data file in order that all the transactions have a commonality before the data interrogation routines. In the filtering stage all the transactions met the testing criteria but in the fraud auditor's judgment all the selected transactions should not be included in the final sample for fraud testing. The reasons vary, but the common reasons are materiality, frequency, and overt data errors.

So, why not filter out the small‐dollar transactions as part of the inclusion and exclusion process? The answer is simple; fraud in the aggregate. Not all perpetrators commit one large fraud scenario; many perpetrators simultaneously commit several small frauds. Create the report, look at the report, and decide how and what to filter from the report.

Continued Illustration of Corruption Project

Since the specific identification included all purchase orders meeting the criteria, the plan would filter:

  1. Purchase orders where the aggregate of the matched purchase orders does not exceed the bidding level.
  2. Average amount of the purchase orders is low. There might be an efficiency comment lurking in the data but most likely not a bid avoidance issue.

Plan Question Eight: What Is the Basis of the Sample Selection Process?

The sample selection process is dependent on the data interrogation strategy and the intent of the data interrogation routine.

The specific identification strategy is designed to identify all entities or all transactions that meet a specific attribute or attributes. Therefore, in theory all transactions meeting the specific identification test should be part of the sample.

Another use of the specific identification in conjunction with the inclusion and exclusion theory is to reduce the size of the population to increase the probability of a random sample selecting a fraudulent transaction.

Let's assume in a payroll audit that your company has 100,000 active employees. The fraud scenario is a fictitious employee. One of the tests is a duplicate bank account number. In today's world, finding two different employees with the same bank account number is not unusual. The test has 5,000 employees that meet the duplicate bank account number test. If the auditor randomly selects 25 employees, the odds of finding a fictitious employee are 1 in 5,000 versus 1 in 100,000.

The internal control avoidance theory is that an internal person is intentionally avoiding an internal control for the purpose of committing a fraud scenario. The sample selection is based on all transactions that meet the internal control avoidance theory.

Data interpretation is used when the sophistication of concealment is high. The items selected and the number of items selected is judgmental based on the fraud auditor's judgment.

Number anomaly the sample selection should be based on the definition of the number anomaly. That is, recurring number anomaly is a number that recurs six or time times attached to the same entity.

Continued Illustration of Corruption Project

The sample selection would be based on the selected fraud data analytics strategy.

Plan Question Nine: What Is the Plan for Resolving False Positives?

The first step is to identify the types of false positives that will occur based on your data interrogation routine. To illustrate the problem, we will use the duplicate test to locate hidden companies; the search routine examines the address field on the master file. So, what could cause a duplicate test to identify a duplicate address that does not link to a shell company?

  1. A dormant vendor on your master file that ceased doing business and a new vendor that has moved into the address.
  2. Merging of different company files.
  3. An inherent weakness in the new vendor procedures that allows the same company to be added to the master file.
  4. A real company that operates different businesses under different names.

There are two fundamental strategies: Minimize the false positives through the data analytics plan, or allow the fraud auditor to resolve the false positives through the audit procedure. The only wrong answer is no answer to the question.

Continued Illustration of Corruption Project

In the purchase order analysis, a false positive may occur through the lack of a proper description in the line item. Through visual examination, the fraud auditor will review the description and determine if the match is based on a data integrity issue based on input error.

In the entity analysis for the hidden entity, the previous reasons for a duplicate address will be considered.

Plan Question Ten: What Is the Design of the Fraud Audit Test for the Selected Sample?

The last step of the plan is to design the audit procedure that will corroborate or refute the transactions identified through the fraud data analytics plan. The fraud audit procedure has four considerations (see Figure 4.1).

A diagram for audit procedure design to detect fraud with Fraud Audit Procedure in a text box at the center of a rectangle split into four text boxes.

Figure 4.1 Audit Procedure Design to Detect Fraud

In Chapter 1, the concept of degree of certainty was discussed. For the fraud auditor, in the fraud test our degree of certainty statement is the “is or is not statement,” as follows:

  • There “is or is not” credible evidence that the following scenario is occurring.

It is important to reflect on the different conclusions of a fraud auditor and a fraud investigator. The fraud auditor's job is to find transactions for investigation, whereas the fraud investigator's job is to refute or corroborate whether the fraudulent act occurred.

Continued Illustration of Corruption Project

The audit procedure for the hidden entity scenario would use the procedure described in Chapter 6.

The split purchase order test would need to determine if the price increase was consistent with industry price inflation or consistent with a kickback scheme.

Illustrative Example of a Fraud Data Analytics Plan Using Payroll Fraud Scenarios

Finally, we are at the stage of building the fraud data analytics plan. Starting with Chapter 2, our fraud scope is the primary category of asset misappropriation, and the secondary category is the theft of monetary funds. The following fraud scenarios are identified as a byproduct of the fraud risk assessment:

  1. Budget owner/causes a fictitious person to be set up on the employee master file/the budget owner submits false time and attendance records for the fictitious person/causing the diversion of funds.
  2. Payroll function/causes a fictitious person to be set up on the employee master file/the payroll function creates false time and attendance records for the fictitious person/causing the diversion of funds.

The following illustration is designed to provide the fraud auditor with an example of the thought process outlined in Chapters 2 and 3. The two fraud scenarios are very similar from a fraud data analytics perspective. From a how the fraud scenario is committed perspective, though, they are very different. Remember, fraud data analytics is all about the data versus the inherent control weakness that would allow the scenario to occur in your company.

  1. Identify the components of the fraud scenario:
    1. Person committing in the first scenario is a budget owner, which indicates that the person committing the scheme has indirect access versus direct access to the master file. The budget owner does have a home for the fictitious person scheme and a budget where the false payroll expense can be recorded. How the budget owner causes a false employee to be added is an internal control issue, not a data issue.
    2. Person committing in the second scenario is the payroll function, which means they have direct access but no budget where the false payroll can be recorded. The payroll function challenge is to record the payroll expense in a department where the payroll expense will not be noticed. For the payroll function, the fraud auditor's knowledge of the company and those departments where a fictitious employee would go unnoticed becomes part of the thought process.
    3. Type of entity is a falsely created employee, which means the false employee must be added to the payroll with a disguised identity. The second possibility is that the created employee only appears in the payroll register versus in the human resources master file.
    4. The action statement is paid for services not performed. The time sheets are part of the concealment strategy and a necessary false record to cause payroll to calculate a payroll payment. For the budget owner, the fraud action statement is easy. Budget owner creates the time record. The user ID for the time record should link to the budget owner. For the payroll function, the ease or difficultly will depend on whether the timekeeping is manual or automated. In manual systems, the payroll function has direct access, therefore the payroll function has the ability to enter false hours. Whether the timekeeping is directly integrated into payroll system and whether payroll has override capacity on the timekeeping or payroll system are the important questions. The answers to these questions will determine how to search the data.
  2. Identify the data that relate to the scenario:
    1. Master file will contain the identity information: name, address (street, city, state, postal code, country), bank account, government identification number (key information), telephone number, emergency contact information, email address, birthdate, hire date, position code, marital status, beneficiary information associated with health insurance or life insurance benefits, and tax withholding information (value of tax information varies by country).
    2. Time record file contains the hours submitted for payment. Time stamp for hours submitted might be useful depending on internal processes. In an automated system, creator ID and approval ID will be part of the time record, although the physical device in which the time record was created will be the key data element.
    3. Payroll records as to employee: gross payroll, deductions, and net payroll.
  3. Select the fraud data analytics strategy:
    1. Specific identification.
      1. Missing data analysis would search for employee records that are missing normal identifying information. The availability analysis to determine what is the norm within your company for missing information would tell the fraud auditor the effectiveness of this test.
      2. Duplicate information regarding: government identification number, bank account, or street address. Since most audit software has predetermined duplicate tests, the duplicate test should always be considered.
      3. Specific anomaly:
        1. 1. No address or bank account information.
        2. 2. No tax withholding.
        3. 3. Employee on the payroll is a prior employee.
        4. 4. Bank account is external to country of origin for the company.
        5. 5. A lack of voluntary deductions.
        6. 6. Human resource records do not indicate that an employee evaluation was submitted.
      4. Match to vendor file to determine if employee was a prior contract employee. This would explain valid government registration number. Second, test to determine the country of residence. The third test is to search for a change or different bank account number between accounts payable and payroll.
    2. Internal control avoidance:
      1. Budget owner—user ID of budget owner linked to creating the time record versus the employee ID.
      2. Physical device creating the time record is a duplicate for the creator and approver of the time record.
      3. Payroll function—after‐hours creation of employee record.
      4. Payroll function—user ID of payroll function linked to creating the time record versus the employee ID.
    3. Data interpretation—fictitious employee schemes seldom require data interpretation analysis. A caveat to that rule is when the fictitious person was created by senior management to pay bribes.
    4. Number anomaly—time records or reported hours are often even or round numbers, so the test would produce a lot of false positives.
  4. Clean the data set:
    1. Availability is an important test before using the missing analysis.
    2. Reliability—time records do not typically have the data errors found in vendor payments or the revenue cycle.
    3. Usability—the availability test on the employee record will determine the usability.
  5. Find logical errors:
    1. Employee records—the duplicate test would expect to create false positives regarding bank account, address, and telephone number if the company allows family members to work for the company.
    2. Time records and payroll records are not anticipated to create false positives.
  6. Create two homogeneous data files. One of the difficulties in fraud data analytics for payroll is that different groups of employees have different data and payment processes. That is, hourly employees are paid via the number of hours on a time record, where salaried employees' gross pay is based on annual salary divided by the number of pay periods. Therefore creating homogeneous data sets is required to perform effective data analysis.
    1. The inclusion theory for hourly employees would include:
      1. Hourly employee who works a full‐time schedule.
      2. Hourly employee who works a part‐time schedule.
      3. Temporary employee whose sole purpose is to cover planned absences.
    2. The exclusion theory for hourly employees would not include:
      1. Terminated employees, because they would have no time record or payroll register record.
      2. Salaried employees.
    3. The inclusion theory for salaried employees:
      1. Include all active salaried employees, assuming time attendance records are submitted.
      2. Terminated employee within the scope period. Note if the scenario was a temporary employee scheme, then the inclusion/exclusion theory would be different.
  7. Select criteria for each test designed. There must be criteria for selection:
    1. Missing test—the number of missing data fields would be based on the frequency of missing data by employee. The scoring sheet concept.
    2. Duplicate test—the key fields would be address and bank account, depending on the method of payment.
    3. Specific anomaly test—all employees meeting the specific anomaly are selected.
    4. Match to vendor file, based on government identification number—all employees meeting the match criteria are selected. Data filtering might be used to reduce the number matched.
    5. Off‐hours creation of employee record, all is selected.
  8. Identify programming routines. The programming would be dependent on the audit software, which is beyond the scope of this book.
  9. Select filtering techniques. No filtering of ghost employees will be used due to the fraud scenario.
  10. Select sample. Based on the reports created and the selection criteria, all transactions meeting the criteria are selected.

Summary

The fraud data analytics plan is based on the fraud scenario, the sophistication of concealment, usability of data, and the selected data‐mining strategy. The fraud data analytics plan must be created for each fraud scenario included in the scope. Yes, there will be overlap between the fraud scenarios. But the plan is all about the thought process in building the data interrogation routines.

Appendix: Standard Naming Table List for Shell Company Audit Program

VENDOR MASTER FILE

  1. Vendor Number
  2. Company Number
  3. Division Number
  4. Active or Inactive Code
  5. Vendor Creation Date
  6. Vendor Creation Time
  7. Last Update File Date
  8. Last Update Time
  9. Vendor Record Creator ID
  10. Vendor Record Authorizer ID
  11. Vendor Update ID
  12. Vendor Update Authorizer ID
  13. Vendor Name (If both a full name and short name are recorded, provide both fields.)
  14. Vendor Address Street (If multiple fields, provide each field as a separate column.)
  15. Vendor Address City
  16. Vendor Address State
  17. Vendor Address Zip Code
  18. Vendor Country
  19. Vendor Federal ID #
  20. Vendor Telephone Area Code (If multiple telephone numbers, provide each as separate columns, i.e., cell number.)
  21. Vendor Telephone Number (seven‐digit number)
  22. Vendor Contact Person
  23. Vendor Contact Telephone Number
  24. Vendor Email
  25. Vendor Website
  26. Minority Business Code
  27. Electronic Payment Field
  28. Bank Routing Number
  29. Bank Account Number

VENDOR INVOICE FILE

  1. Vendor Number
  2. Company Number
  3. Division Number
  4. Vendor Name
  5. Invoice Number
  6. Invoice Line Number
  7. Invoice Date
  8. Invoice Amount
  9. Item Number
  10. Item Unit Price, if information is recorded
  11. Item Unit of Measure, if information is recorded
  12. Item Quantity, if information is recorded
  13. Item Description
  14. Transaction or Journal Number
  15. Transaction or Journal Recording Date
  16. Transaction or Journal Recording Time
  17. User ID or Name of Record Creator
  18. Approval Code or Name
  19. Approval Department
  20. Purchase Order Number
  21. Purchase Order Date
  22. Original Purchase Order Amount
  23. Amended Purchase Order Amount
  24. Purchase Order Issuer ID
  25. Receiving Number
  26. Receiving Date
  27. Receiving Time
  28. Receiving Amount
  29. Check/ACH/Wire Payment Indicator
  30. Check/ACH/Wire Number
  31. Check/ACH/Wire Date
  32. Check/ACH/Wire Time
  33. Check/ACH/Wire Amount
  34. Check Address
  35. Bank Account Number
  36. Bank Routing Number
  37. Electronic Approval
  38. Manual Check Indicator
  39. General Ledger (Typically the accounts payable, inventory, or work‐in‐process account. The expenditure code is captured from the purchase order file. In case of no purchase order file, determine how general ledger expense classification is linked to the expenditure transaction.)
  40. Job Number
  41. Contract Number
  42. Commodity Code
  43. Bid Code

PURCHASE ORDER DATA

  1. Vendor Number
  2. Vendor Name
  3. Company Number
  4. Division Number
  5. Purchase Order Number
  6. Purchase Order Date
  7. Purchase Order Amount
  8. Transaction or Journal Number
  9. Transaction or Journal Recording Date
  10. Transaction or Journal Recording Time
  11. User ID or Name of Record Creator
  12. Approval Code or Name
  13. Buyer Code or Name
  14. Revised Purchase Order Amount (Information will depend on how information is stored. Typically either the new total purchase order amount is recorded or the change amount to the purchase order. Please advise if multiple fields for purchase order change fields.)
  15. Buyer Code or Name responsible for the purchase order change
  16. Revised Purchase Order Date
  17. Commodity Code Standard classification code for the item purchased
  18. Bid Code (Describes the method for obtaining competitive bidding for the item purchased.)
  19. General Ledger Account (All fields necessary to post to proper expenditure code, including but not limited to division, company, department, expense G/L, and any other codes for responsibility reporting.)
  20. Job Number, if expense is coded to a specific job

DISBURSEMENT FILE

  1. Vendor Number
  2. Vendor Name
  3. Company Number
  4. Division Number
  5. Check Number, ACH Number, Wire Number
  6. Check Date, ACH Date or Wire Date
  7. Check Amount, ACH Amount or Wire
  8. Bank Account Number
  9. Bank Routing Number
  10. Vendor Invoice Number
  11. Vendor Invoice Date
  12. Vendor Invoice Amount

MASTER FILE CHANGE FILE

  1. Vendor Number
  2. Vendor Name
  3. Company Number
  4. Division Number
  5. All Other Fields in Change Record (Each field should be retained in its own column.)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset