Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3
Data Analytics Strategies for Fraud Detection

Now that the fraud scope is established and the fraud scenario register is created, the next step is to select a strategy for each fraud scenario and understand how the strategy is impacted by the sophistication of the concealment. Your initial use of this process will seem overly bureaucratic. However, eventually the process will become just a way of thinking. The thought process is important to understand what your data analytics plan is designed to detect and what the data analytics plan will not detect.

This chapter is intended for the fraud auditor who wants to understand fraud data analytics in the later chapters. In the real world, you will need to adapt the suggestions contained in the book to your data files. You will be eventually requested to perform data analytics in business systems for which you have no practical business experience or you will use fraud data analytics to perform an investigation. You will need to develop a fraud data analytics plan where no auditor has gone. I believe the guidelines will serve you well. For the auditor who simply wants a software vendor to provide you with a list of tests, you should pass over this chapter and proceed directly to Chapter 7. For that auditor, I will provide my top three reports that you should create for the business system as part of the chapter summary.

In Chapter 2, I referenced the building of a house. I said the fraud scenarios provided the foundation to the house and the fraud data analytics plan was the blueprint for building the house. Furthering that example, the strategies outlined in this chapter are the building code requirements used by the building inspector to ensure the house is sound.

The strategies provide the basis for developing the search routines to identify the red flags based on the sophistication of the concealment strategies. The two key terms for red flag identification are patterns and frequency. Understanding the data patterns that link to the fraud scenario versus common data errors is critical to the efficiency of the project. The frequency analysis helps identify if the occurrence of the pattern is consistent with the fundamental fraud theories. Therefore, the first step is to identify the patterns and frequency of red flags that correlate to the fraud scenario data profile.

The last section of the chapter is intended to provide practical guidance on developing the actual search routines in using the data commonly found in all business transactions. The methodology for developing fraud data analytics search routines is covered in Chapter 4.

Before we discuss the data analytics strategies, the fraud auditor needs to understand how the sophistication of concealment will affect the selected fraud data analytics strategy and the eventual sample selection. In reality, the sophistication factor and the right data analytics strategy is a lot like the phrase: Which comes first, the chicken or the egg? Both concepts need to be considered; in what order is based on the fraud auditor's style. However, ignoring either concept will result in building a fraud audit program that is designed to fail.

Understanding How Fraud Concealment Affects Your Data Analytics Plan

We have developed a simple system of ranking fraud concealment sophistication based on low, medium, and high. There are two sides of the definition. From the perpetrator's perspective, it is the ability or intent to conceal its fraudulent actions from detection. From the auditor's perspective, it is the ability of fraud data analytics to identify the fraudulent action for audit examination.

Concealment is either a general condition of the database or a specific action committed by the perpetrator. However, do not think of this as cloak‐and‐dagger. In some ways, the general conditions are what allow the fraud scenario to go undetected. However, the specific concealment actions become the basis of our fraud data profile.

General Concealment Actions:

Number of records processed in a year.
Number of records that an employee approves in a year.
Real or perceived time pressure to process a transaction.
Changes in how society conducts business; mailbox service companies have become very popular.
Societal changes; area code does not link to a physical address due to the telephone portability act.
New technology changes the way we look at an address or bank account. The cloud can serve this function in disguising the ultimate receiver.
Natural vulnerabilities within internal control systems.

Specific Concealment Actions:

Keeping all fraudulent transactions within the perpetrator's control level.
Ensuring the dollar value of the fraud does not exceed internal budgets.
Understanding how controls operate in other areas of the business system.
Realizing that smaller‐dollar‐value transactions tend to get less scrutiny than large‐dollar transactions.
Management override can occur without a paper trail or data trail.
Documents or transactions that look real tend to get less scrutiny.
Creating the illusion of compliance with internal controls.
Creating a legal entity that provides the illusion of a real company.

So, how do we define the concealment concepts and how do we integrate the concepts into our fraud data analytics plan? That is the million‐dollar question.

Low Sophistication

The perpetrator's footprint is visible in the entity data or the pattern recognition in the transactional data is visible to the naked eye through the use of data interrogation routines. Footprint in the entity data is typically identified through missing data analysis, matching routines, duplicate routines, or specific anomaly testing. The duplicate routine is searching for duplicate data in the same database. The matching routine is searching for a match of two or more databases where the entity could logically exist. A high degree of missing entity data indicates that someone is manually controlling the process versus the automated system. In transactional data the pattern recognition tends to allow for specification identification. In low sophistication, the pattern recognition tends to be an exact match. The three types of matches are discussed later in the chapter.

To illustrate low sophistication:

If a vendor has neither a street address nor a bank account, how is the payment being delivered to the vendor?
Regarding low sophistication of concealment within the entity data: Searching for a vendor operating under two different names or a duplicate telephone or email address would provide the linkage.
Regarding low sophistication of concealment within transactional data: Vendor invoice numbers follow a sequential pattern (pattern recognition). The number of occurrences and the date range of the sequential pattern would be the basis for sample selection.

Medium Sophistication

The perpetrator's personal identity and the fictitious entity's identity have a limited or vague connection. There is no exact match on the complete address, but there is a match on the postal code. The match or duplicate test is not sufficient by itself to cause the entity to be selected. The transactional data may show an avoidance of internal controls or an anomaly to general business practices or business expectations.

To illustrate medium sophistication:

In the entity data, vendor postal code matches the postal code of the head of purchasing.
In the transactional data, there is a regular frequency of invoices being structured to avoid dual‐signature controls. The invoice dates are not an exact match but a close match.

High Sophistication

The perpetrator's personal identity and the fictitious entity have no overt connection within the entity data. The transactional data generally conforms to the internal controls. For real entities committing the scenario alone, the success of the pattern recognition is the auditor's ability to benchmark the internal data to business expectations or through the use of outlier or frequency analysis. For external real entities in collusion with an internal source the key is the internal source needs to control the volume of the fraudulent data. For external sources complicit with the internal perpetrator, either the perpetrator is the only internal source to use the real non‐complicit entity or the volume of transactions, either quantity or dollar value, exceeds business needs. There could be a change in the use of the external entity in conjunction with a change in the internal management team.

To illustrate high sophistication:

In the entity data, there is no match on address, bank account, or government registration number.
In the transactional data, the number of journal entries below a control threshold is illogical for the company.

In building the fraud data analytics plan, the fraud auditor needs to consider the scenario, the fraud data mining strategy, and the sophistication of the concealment strategy. Without calibrating the plan for the sophistication of concealment, the fraud auditor will not know what the plan is intended to find and what the plan cannot find by the way the interrogation plan was designed.

As a guideline, fraud auditors should start building their fraud data analytics plan to locate fraud based on low or medium sophistication, whereas investigations based on sufficient predication should consider high sophistication. Figures 3.1 through 3.3 are intended to reflect our experiences with fraud data analytics.

Figure 3.1 illustrates how to correlate the sophistication of concealment to the examination of data.

Image described by caption and surrounding text. — Figure 3.1 Fraud Concealment Tendencies

Figure 3.2 illustrates how to correlate the sophistication of concealment to the fraud data analytics strategies.

Figure 3.3 illustrates how to correlate the sophistication of concealment to a specific data field.

The fraud auditor must consider how the level of concealment impacts the fraud data analytics plan's ability to identify transactions that have a higher probability of being fraudulent versus a red flag caused by a data error. The concept is also important to understand how a fraud data analytics plan may have a false negative. To explain how a false negative can occur within your fraud data analytics plan, consider the duplicate bank account test:

The test is perfect for low sophistication because the bank account numbers result in an exact match. However, at the medium level, the exact match can only occur on the bank routing number, and at high sophistication of concealment the strategy will fail to have any match on bank account number. It is that simple.

Shrinking the Population through the Sophistication Factor

To explain the concept of shrinking the population, the fraud auditor needs to understand that there is a direct correlation between the degree of sophistication of the concealment strategy and the number of transactions meeting the data profile requirements. In low sophistication, the pattern recognition is specific (exact match on a street address), whereas in high sophistication, there is no exact match; therefore, more vague criteria must be used to narrow the population (all entities created in a time period within a postal code radius).

The resulting impact is the ability to shrink the population of possibilities. Highly sophisticated concealment strategies tend to have a larger number of transactions meeting the fraud data profile. By contrast, low‐sophistication concealment strategies tend to have a smaller number of transactions meeting the fraud data profile. Understanding how the sophistication of concealment impacts the number of transactions meeting the profile is critical to using the inclusion and exclusion theory discussed in Chapter 4. Shrinking the population is one way to minimize the impact of the general concealment conditions (see Figure 3.4).

A diagram for improving your odds of selecting one fraudulent transaction with 100, 100,000, and 1,000,000 transactions in an inverted triangle from bottom to top, respectively. — Figure 3.4 Improving Your Odds of Selecting One Fraudulent Transaction

The goal of fraud data analytics is the selection of transactions that have a higher probability of linking to a fraud scenario. Clearly, fraud data analytics is not about a random selection of transactions. The following guidelines should be considered in understanding how the sophistication of concealment will impact the fraud auditor's ability to identify the right transactions for audit examination.

Low Sophistication:

Specific identification strategies are used for both entity and transactional data.
Entity identifying information links to the perpetrator's known identifying information, for example, a specific street address.
The false entity structure will match to another entity in either the same database or a different database.
False entity will also reveal missing identifying information in order to reduce someone else's ability to contact the false entity.
The patterns associated with the transaction data will typically be overtly obvious to the naked eye.
The pattern recognition for the transaction data allows for specific identification.
Sample size is determined by the number of transactions that match the data profile. The sample size can be either zero because no transactions link to the data profile or a very large sample because the match criteria are not sufficiently defined.

Medium Sophistication:

Internal control avoidance strategies tend to be more effective.
Specific identification routines are less effective because there is no direct match to the entity's data.
Specific identification will allow for a match on some aspect of the entity information.
Specific identification is more effective when there is an allegation that focuses on a person or department.
Entity identifying information relates to some aspect of the perpetrator's known identifying information, for example, a postal code versus a physical street address.
Internal control avoidance strategies should be used for transactional data.
Outliers' patterns tend to be effective for transactional history analysis.
Creating smaller homogeneous data groups, referred to as cluster patterns, will facilitate the auditor's ability to spot an anomaly.
Filtering techniques based on dollar magnitude are effective in reducing the number of transactions fitting the data profile. The difference between filtering and red flag identification is discussed in Chapter 4.
Sample selection is based on the entities or transactions that avoid the internal control after all relevant filtering.

High Sophistication:

Data analytics at this level is like code breaking. There is no finite criterion that serves as identification criteria. The process tends to be judgmental selection versus criteria selection. The key is to understand how the fraud scenario occurs in your business systems.
The specific concealment strategies used by the perpetrator tend to be more deliberate and planned.
Direct matches seldom occur.
Entity identifying information has no relationship with the perpetrator's known identifying information.
Entity identifying information may relate to a mailbox service or an out‐of‐area address that has a mail forwarding feature, providing the illusion of a real business.
Transactional data are more effective at identifying fraud scenarios versus entity data.
The process of creating smaller homogeneous data files based on geography, transaction types, transaction codes, and cost centers facilitates the data interpretation.
Filtering techniques like drill‐down analysis are effective in reducing the number of transactions fitting the data profile, thus allowing data interpretation to be more effective.
Sample selection relies on data interpretation skills.
Sample size tends to be judgmentally determined based on the data interpretation.
Selection process is based on understanding how the scenario operates, money trail, fraud theory, concealment theory, and professional experience of the auditor.

Building the Fraud Scenario Data Profile

The fraud scenario data profile is all the identified red flags associated with the fraud scenario. The red flags need to be calibrated to the level of fraud sophistication. The following guidelines should be considered in building a fraud data profile:

Red flags are both data items logically identified either through data interrogation or through audit examination of documents.
Most data red flags are contained in the underlying documents supporting the transaction.
Red flags that exist in public records should be incorporated into the selection process or audit examination, whenever possible.
Using multiple red flags for sample selection will reduce the number of false positives.
Each red flag should have a weight assigned to the importance of the red flag: low, medium, and high. Clearly, not all red flags have the same value.
A composite score of all the red flags associated with the entity or transaction should be the basis for sample selection.
Please do not rate everything high; it defeats the purpose of ranking the red flags.
Most red flags have both a fraud tendency and a logical business explanation. It is important to understand the distinction of good or bad.
False entity scenarios start with entity red flags based on the type of false entity and then search for the transaction red flags that link to the fraud scenario.
Conflict‐of‐interest entities have similar characteristics to false entities.
Real entity scenarios seldom use entity criteria for selection except for real entities operating under different names.
Real entities scenarios tend to focus on the transactional red flags associated with the specific fraud scenario.
The fraud data profile is a set of specific identifiable criteria for specific identification, internal control, and number anomaly strategies. The fraud data profile is relevant for the data interpretation strategy but the strategy tends to be more judgmental for sample selection.
Transactional red flags correlate to false entities, conflict‐of‐interest entities, and real entity scenarios.
Entity red flags will match the perpetrator at low‐ and medium‐concealment strategies but not at high‐concealment strategies.
Transactional red flags usually contain a footprint of the perpetrator.
Remember, these are guidelines, not absolute rules.

Precision of Matching Concept on Red Flags

The concept of identifying red flags seems relatively simple until the fraud auditor factors into the equation the following types of matches:

Exact match. The two data elements are an exact match. An easy example is invoice date. Two invoices from the same vendor are dated July 24. From an address field, two or more vendors have the same street address; city, state, and postal code are an exact match.
Close match. The data elements are in close proximity to another transaction. Two invoices from the same vendor are within three days of each other. From an address field, two or more vendors have the same city, state, and postal code, but different street address in the street field.
Related match. The data elements are within close proximity but exceed the close match test. From an address test we would focus on postal code or all postal codes within a geographic radius.

As a guideline, the fraud data analytics should start with exact match followed by close match. The related match tends to be used for high sophistication of concealment or a specific allegation of fraud. The allegation allows the data analytics to reduce the population of entities and transactions, which allows data interpretation strategies to be more effective.

Fraud Data Analytic Strategies

There are four basic strategies in developing a data interrogation routine. Within each strategy, we need to identify the associated pattern and consider the frequency of the event. Each strategy has strengths and weaknesses for fraud detection:

Specific identification of a data element or an internal control anomaly
Internal control avoidance
Data interpretation
Number anomaly

Specific Identification of a Data Element or an Internal Control Anomaly

The design of the test is exactly as it sounds; it focuses on identifying a specific data pattern. The process starts with a fraud scenario and then specific data pattern that is associated with the fraud scenario. The following fraud scenario illustrates the concept:

A real supplier overbills the company by intentionally overcharging within the payment tolerances for matching a purchase order to an invoice in the accounts payable system, resulting in the loss of company funds.

The specific identification is all supplier invoices that exceed the original purchase order amount within the tolerance levels. The frequency analysis would highlight those suppliers committing the fraud scenario with a higher frequency versus a pattern that occurs with a low frequency.

The following types of data interrogation tests are usually associated with the specific identification strategy:

The specific identification strategy focuses on the following types of tests for entity data:
1. Match: searches two different databases for the same data element. The matching concept is also used in the internal control avoidance.
2. Duplicate: searches within one database for the same data element in two different entities. The data analytics should use data elements that are more difficult to conceal or would cost the perpetrator money—i.e., telephone number or email address.
3. Missing: searches for a data element that should exist in the data file. With missing searches it is the totality of the missing information versus a missing data element.
4. Changed: searches for a change to a data field through examination of the change file or comparing two database files at the beginning and end of the scope period.
5. A specific anomaly in a specific data field.
6. Address and bank accounts tend to be critical data fields because the transfer of funds requires an electronic transfer or the mailing of funds. If both data fields are blank, this is an example of a specific data anomaly.
The specific identification strategy focuses on the following types of tests for transactional data:
1. The identification process starts with a specific fraud scenario.
2. A specific pattern in the data—i.e., a sequential pattern of vendor invoice numbers.
3. A group of transactions that can be identified based on a specific data criterion—i.e., excessive overtime or invoices applied to a dormant purchase order.
4. Missing: invoices with no purchase order.
5. Change analysis should focus on a transaction being changed, deleted, or voided—i.e., an employee's time report changed by a supervisor or payroll supervisor.
6. Duplicate transactions. Supplier invoices with a duplicate invoice number or two invoices with the same date and amount.
7. Manual transactions. These are transactions that are not created through the automated systems and are instead recorded through a manual process.
8. Data anomaly in a specific data field that correlates to a fraud scenario or a concealment strategy.
9. Data anomaly associated with your company's normal business practices—i.e., speed of payment is searching for a vendor that is paid faster than normal company payment terms.
Specific identification tends to be more effective by starting with a cluster pattern and then using specific attributes to refine the selection process. The use of multiple criteria reduces the number of false positives.

Illustrative Examples of Specific Identification Strategy:

Employee that is missing a user ID for access to the building.
Employee that is missing emergency contact information.
Employee with a change to bank account near termination date.
Two employees with different last name, same bank account for direct deposit.
Employee's name matches a contract employee name in the vendor file.
Employee claiming exemption from income taxes and the gross wages would indicate the employee could have a tax liability if there is no tax withholding.
All employees with no voluntary withholding for fringe benefits, such as 401(k) plan and health insurance.

Guidelines for Use of Specific Identification Strategy:

The specific identification is used when the concealment of the fraud scenario is at a low level.
The focus of the strategy for entity information is to locate false entity scenarios.
The focus of the strategy for transactional information is to identify specific data pattern associated with a fraud scenario.
To illustrate, I would use the specific identification strategy to identify open purchase orders with multiple invoices, or I would identify all invoices that are greater than the original purchase order.
Obviously, a vendor invoice being greater than the purchase order is not a reason to call in the investigators. However, history has also taught us that override or change orders are associated with fraud scenarios.
Do not have a myopic view of the test words. Missing can be used in many different ways for testing purposes. Missing could mean internal controls require the data element to be populated or may simply mean that the data field is blank.
The goal of the test is to identify a specific entity or a set of transactions that link to a specific entity that meets the specific identification criteria.

Consider the Following Scenario

Budget owner causes an employee who terminates employment (employee stops going to work) not to be removed from the payroll system, and the budget owner submits time and attendance records in the name of the terminated employee for a temporary period of time, causing the diversion of funds.

Since the statement indicates the scenario is temporary, we can identify all terminated employees as our first specific identification test. The reason for selecting all terminated employees is based on the temporary criteria and the fraud that the budget owner is committing over a period of time.

The second test is based on payment method. If the employee is paid with direct deposit, then through our data analytics we could identify all terminated employees that had a bank account change during their employment period. Since the scenario could not occur without a change to bank account, the analysis provides a sample that has a higher probability of being consistent with the specific fraud scenario.

If the employee is paid through a manual check, then the specific identification identifies a group of employees that fit the profile but requires manual examination of records to see if red flags exist on the endorsement side of the check.

Internal Control Avoidance

The concept focuses on transactions that either avoid internal controls or attempt to circumvent internal controls. The strategy is based on the concept that when an individual is intentionally avoiding internal controls the person may have evil motives. The audit examination determines whether the control avoidance is linked to a fraud scenario or to a cavalier attitude toward internal controls.

The Fundamental Strategies for Internal Control Avoidance

Avoidance of the dollar levels:
1. Structuring transactions to avoid a control level. This is the process of splitting one transaction that would exceed the control level into two or more transactions to avoid a control level. The tests are designed to locate duplicate transactions or to identify multiple transactions in the aggregate that exceed a control level. This concept goes by many different names, such as split transactions.
2. All transactions are below a control level. All transactions associated with a specific entity are below the control level. The aggregate dollar volume with the transaction is a key criterion in the selection process.
Off‐period transactions:
1. Creating or changing an entity during non‐business hours or days.
2. Creating, changing, voiding, or deleting transactions during non‐business hours or days.
3. Creating, changing, voiding, or deleting transactions from a remote location.
4. Create or change a transaction that originated from someone who did not typically process the transaction. The controller recorded the invoice versus the accounts payable function.
5. Create or change a transaction by someone that is absent from work due to some form of personal leave.
6. Create or change a transaction by someone responsible for the custody of an asset.
Illogical order of transactions; every business system has a logical flow of documents. In purchasing, there should be a requisition, followed by a purchase order, a receipt transaction resulting in a vendor invoice. Comparing the purchase order date to invoice date would indicate the transaction that avoided the purchasing internal controls.
Reclassification, changes, transfers, reversals, voids that occur within a short duration after the initial recording of the transaction or occur in two different operating periods.
Speed of transaction; the transaction is processed more quickly than policy or normal business practice. The technique compares the date of one transaction to the date of another transaction. A simple example is comparing the vendor invoice date to the payment date, searching for vendor invoices that are paid faster than normal terms.
Use of aged documents or system codes that provide open authority.
Manual transaction is a transaction that is created and processed external to the automated system and then is recorded manually in the database.
Override transactions; many systems have codes that allow a transaction to bypass computerized controls for logical business reasons. These override features allow a perpetrator to circumvent internal controls. A manual transaction can also be viewed as override transaction.

Illustrative Examples of Internal Control Avoidance

Requesting a vendor to split an invoice so that both invoices only require one approval but in the aggregate would have required two approvals.
A dormant vendor's address is changed on Saturday night at 11:00 PM from a remote computer that links to the controller.
Purchase order is created after the receipt of the vendor's invoice.
Payroll manager issues two payroll checks to herself; the first is through the automated system, the second is a manual payroll check.

Guidelines for Use of Internal Control Avoidance Strategy

The internal control avoidance strategy is used when the concealment of the fraud scenario is at a medium level.
The false entity has limited linkage to the perpetrator's identity.
Locating false entities can still occur at this level when we have identified a person or a group of persons who are the focus of the data analytics.
The frequency number must be established for each test.
The pattern recognition is based on the design of the internal control.
All off‐period entity transactions should be selected. With business operating 24/7, the design of this test becomes more challenging.
Retrospective analysis of a contract or bid is an effective method to detect corruption.
The concepts of exact match, close match, and related match are critical to assess internal control avoidance.
The goal of internal control avoidance is to identify a transaction that has the appearance of avoiding an internal control.
The difference between specific identification and internal control avoidance is sometimes a blurred line. I encourage the reader to stay focused on the intent of strategy.

Consider the Following Scenario

Budget owner in collusion with a supplier splits invoices to stay below a control level for the purpose of overbilling the company based on price, resulting in the diversion of company assets (impact statement) or budget owner receives a kickback from the supplier (conversion statement).

In contrast, this scenario involves a real entity; therefore, entity testing is not relevant. The transaction testing will start with exact match on vendor invoices based on invoice number and date. The second criterion is based on the aggregate amount of the invoice match exceeding the control amount. While a frequency of one split transaction is sufficient for sample selection, a regular frequency of the invoice splitting for one vendor or one department would improve the odds of establishing intent. The next test would be to compare the selected transactions from the first test to the line items on the invoice to previous transactions for the same line item, searching for a pattern of price increases.

If no transactions are identified in the exact match test, the second level of testing would use a close match on invoice number and invoice date.

Data Interpretation Strategy

This strategy requires the selection of entities or transactions through visual examination of data on a report designed by the fraud auditor. Data interpretation is very similar to the standard audit step that states: Review the journal for unusual transactions. The difficultly of performing a manual review of the journal is that the journal tends to list transactions in date or control number order. The manual review requires the auditor to assimilate all that information as part of the process of selecting a transaction. In fraud data analytics, we are able to create a report that summarizes all transactional data related to an entity into one line. In this way, auditors can focus their energy on the selection process.

Guidelines for Use of Data Interpretation

The data interpretation strategy is used when the concealment of the fraud scenario is at a high level.
In my opinion, high concealment generally exceeds the requirements of an audit but would be included in fraud investigation. I also believe that data interpretation would be included in a whistle blower allegation in which the corporation has deemed the allegation to have merit. My comment is based on audits providing reasonable assurance versus absolute assurance.
The first step is identifying which fraud scenarios are included in the audit scope due to high concealment.
The selection of the fraud scenario could occur simply because the auditor is conducting a fraud audit or the fraud risk assessment has rated residual risk as high.
The advantage of planning at the high concealment is that it creates the opportunity for locating all fraudulent activity related to the fraud scenario.
The disadvantage of not planning at the high level is the opportunity for false negatives. In essence, the auditor missed the fraudulent activity because the perpetrator was more sophisticated than the audit procedure.
Once the fraud scenarios are identified, the process starts with identifying the population of data that relates to the fraud scenario. The transactions should be summarized in a high‐level format. The inclusion/exclusion theory is the next step.
The exclusion is intended to eliminate the lines, whereas the inclusion is the lines being considered within the data interpretation strategy. The key is to document your judgment for both the exclusion and inclusion. Since the process is judgmental, the decision process will also be high‐level explanations.
The exclusion theory is to exclude those records or lines that are not consistent with the fraud theory that links to the specific fraud scenario.
The goal of the exclusion step is to reduce the size of the report requiring manual review.
The last step is the inclusion theory, or the basis for selecting a line for audit examination.
For expenditure or revenue audits, I typically use a format that summarizes transactions by aggregate dollars, aggregate record, maximum record, minimum record, and average record. Depending on the analysis, the report should include data from entity file that will assist in the review. I have found entity creation date to be useful.
For payroll, I typically summarize the payroll transactions: number of payroll payments, gross payroll, and net payroll. I find the grade level or job description to be useful.
The goal of the visual review of the report is to reduce the size of the report through the use of data‐extraction routines that filter out those lines that are not consistent with the fraud theory associated with the fraud scenario. This is a critical step in making the process a success. Imagine a 100‐page report with each page having 50 lines, which would require the auditor to examine 50,000 lines of data. Clearly, this is not a practical approach.
Using the data fields I have summarized, we make judgmental decisions about the inclusion/exclusion process. Using the columns of summarized data, we make judgments that are based on fraud theory and the mechanics of the fraud scenario. These judgments are not based on scientific studies but, rather, auditor judgment is based on experience.
We also use other external databases or Internet searches to support our decisions in selecting an entity for audit examination.

Consider the Following Scenario

Budget owner acting alone or in collusion with a direct report/causes a shell company to be set up on the vendor master file/processes a contract and approves a fake invoice for goods or services not received/causing the diversion of company funds (Figure 3.5).

DIV_NUM	VEN_NUM	SHORT_NAME	NO_OF_RECS	VEN_NUM_SUM	VEN_NUM_MAX	VEN_NUM_MIN	VEN_NUM_AVERAGE
5	74600	ST ELECTRI	4714	351,664,400.00	74,600.00	74,600.00	74,600.00
6	32330	K R ELEC S	3026	97,830,580.00	32,330.00	32,330.00	32,330.00
5	55000	N E ELECTR	2716	149,380,000.00	55,000.00	55,000.00	55,000.00
6	1060	A P ELECTR	2635	2,793,100.00	1,060.00	1,060.00	1,060.00
5	51400	M DISTRIBU	2436	125,210,400.00	51,400.00	51,400.00	51,400.00
5	9150	CAP ELECTR	1686	15,426,900.00	9,150.00	9,150.00	9,150.00
6	87840	WE CO DIST	1420	124,732,800.00	87,840.00	87,840.00	87,840.00
5	88737	WW INDUSTR	1345	119,351,265.00	88,737.00	88,737.00	88,737.00
5	26501	GB ELECTRI	1293	34,265,793.00	26,501.00	26,501.00	26,501.00
3	2000	AN, INC.	1079	2,158,000.00	2,000.00	2,000.00	2,000.00
5	8960	C / R	998	8,942,080.00	8,960.00	8,960.00	8,960.00
3	100	A-TECH COR	888	88,800.00	100.00	100.00	100.00
6	2110	A R INC	723	1,525,530.00	2,110.00	2,110.00	2,110.00
6	21210	GR ELECTRI	718	15,228,780.00	21,210.00	21,210.00	21,210.00
1	80925	U RENTALS	706	57,133,050.00	80,925.00	80,925.00	80,925.00
5	54900	N CORPORAT	638	35,026,200.00	54,900.00	54,900.00	54,900.00
6	21120	GR ELECTRI	551	11,637,120.00	21,120.00	21,120.00	21,120.00
1	7420	C CONSOLID	542	4,021,640.00	7,420.00	7,420.00	7,420.00
6	8265	C D INC	493	4,074,645.00	8,265.00	8,265.00	8,265.00
1	72180	T OIL CO.,	466	33,635,880.00	72,180.00	72,180.00	72,180.00
1	24070	HOME DEPOT	464	11,168,480.00	24,070.00	24,070.00	24,070.00
6	76180	IN CHILD S	446	33,976,280.00	76,180.00	76,180.00	76,180.00
1	22740	HZ EQUIPME	443	10,073,820.00	22,740.00	22,740.00	22,740.00
5	2230	A EQUIPMEN	420	936,600.00	2,230.00	2,230.00	2,230.00
5	4740	B F S	406	1,924,440.00	4,740.00	4,740.00	4,740.00
6	23520	HOME DEPOT	379	8,914,080.00	23,520.00	23,520.00	23,520.00
6	31490	KE ELECTRI	371	11,682,790.00	31,490.00	31,490.00	31,490.00

Figure 3.5 Maximum, Minimum, and Average Report Produced from IDEA Software

Using the report of summarized data, we ask questions about the data to exclude certain vendor lines to reduce the report size to a number of lines that can be reviewed through visual examination:

Basis for Exclusion:

Aggregate number of records. Using the frequency concept we would ask how many false invoices the budget owner would submit in a one‐year period. Using the frequency of 52, we would exclude all vendor lines with greater than 52 records. The number 52 correlates to once a week.
Aggregate dollar value. In this exclusion process, we would first focus on eliminating small‐dollar vendor lines. I must caution that the perpetrator might use multiple shell companies to keep a low visibility. I also caution that you do not make any judgments in which you will not be able to eliminate any lines, which defeats the purpose of the process.
Maximum dollar amount. The budget owners typically do not want to submit a fake invoice to their supervisor. Therefore, we could exclude any vendor line when the maximum amount would require the approval of the budget owner's supervisor.
Minimum dollar amount. The exclusion factor for the minimum is more of a judgment than on the maximum dollar amount. I believe most auditors would see the logic in the exclusion factor. As a starting point, we could say, if the minimum is a negative number, denote a credit; we could exclude those vendor lines. What if the minimum is $2.49; would that indicate a real or a false invoice? One perspective is that the perpetrator would submit a low‐dollar invoice amount to allow the vendor to be set up. Another perspective is that a budget owner committing a false billing scheme would not submit an invoice for $2.49. Remember, I said the strategy is based on auditor's judgment.
Average amount. Once again, the exclusion factor is judgmental. Typically, the smaller the average, the more likely we would eliminate the vendor line.

Basis for Selection:

The report after applying the exclusion theory provides a listing of vendor lines that meet the high‐level criteria of our fraud data profile for the fraud scenario.
One selection theory is that all perpetrators have a risk comfort level. The level might be high or low, depending on the perpetrator's fear of detection. Therefore, as the gap between the maximum amount and the average amount gets closer, it is one indicator of the perpetrator comfort zone for the dollar amount of the fake invoices. It may also indicate a perpetrator who has a predictable pressure for an income stream.
Whatever the selection criteria, the fraud auditor needs to identify the selection criteria before reviewing the report.

Number Anomaly Strategy

The strategy is exactly as it sounds. We identify anomalies by focusing on numbers. The strategy is a blend of specific identification and data interpretation. The best part of the analysis is the ease of use. The summarization feature of audit software uses the amount field as the key and summarizes the aggregate dollar value and frequency of occurrence of the amount. The analysis could perform the same summarization by entity by amount. In a large database, the report may become unwieldy.

Bedford's law. The search for an anomaly in the first, second, etc. integers of an amount. The anomaly is based on Bedford's distribution table. The explanation of the concept is beyond the scope of this book. In fact, entire books have been written on the topic.
Even number transaction. The goal of the strategy is to locate a frequency of an even number occurring with a frequency greater than one transaction that links to the same entity structure. The analysis typically focuses on round amounts, such as $1,000 or $5,000.
Repeating number transaction. The goal is similar to even number analysis but focuses on any number that repeats with a frequency greater than a predetermined number and links to the same entity structure.
Contra‐entry transaction. The search for a negative number when the number should be positive or for a positive number when the number should be negative.

Guidelines for Using the Number Anomaly Strategy

In the number anomaly analysis, the data analysis starts with the data anomaly of the test, a pattern, and frequency of an even amount or recurring amount that links to one entity. The fraud auditor then needs to use data interpretation skills to determine which fraud scenario is most likely occurring.
The strategy is the least impacted by the sophistication of concealment.
The strategy does not work when the perpetrator continually changes the amount field.
The anomaly is the frequency of even numbers or a recurring number to one entity. The contra number is the anomaly.
The strategy is highly effective in locating false entity schemes based on the theory that a real vendor would normally have invoices of varying amounts.
The strategy is also useful in searching for pass‐through schemes involving equipment rental. Since rental amounts are often the same, the drill‐down review on the amount field would easily identify vendors.
In one project, a vice president created three shell companies and submitted invoices for $1,100 six times for each shell company.
In one project, a project manager created a shell company called BR Equipment. All the invoices were for $5,500.
In one project, the controller was concealing bad debt on the financial statement by concealing the rebill credit as a contra entry in the sales register. He would issue a credit for the old invoice and record the credit in the sales journal versus the sales adjustment general ledger account. The controller would then issue a new invoice, with a current date replacing the old invoice. The contra entry in the sales journal was the clue.
Contra‐entry schemes are immediately flagged. In one case, we identified negative adjustments in a deduction field that was increasing net pay versus decreasing net pay. In another case, the controller was increasing incurred cost for percentage of completion; the red flag was the size of credits existing in accounts payable. The largest was a negative $420,000.

Consider the Following Scenario

Project manager creates a shell company; he leases equipment under the shell company's name from a real equipment leasing company and then causes the company by which he is employed to lease the equipment from his shell company at an inflated price, resulting in the diversion of company assets.

In the number anomaly strategy, the analysis would identify a high frequency of a recurring invoice amount paid to the same supplier. Since the supplier is an equipment rental company, the next‐level analysis would focus on the invoice number pattern, most likely a sequential pattern, and the description field would be vague by normal industry standards. Referencing BR Equipment, one of the clues in the description field was the fact that the description field referenced a brand name of equipment that could have been leased directly from the brand‐name company.

Pattern Recognition and Frequency Analysis

The science of pattern recognition is very broad, with many different methodologies ranging from the use of the science of mathematics to various classification identification systems. The International Association of Pattern Recognition was formed to fill a need for information exchange among research workers in the pattern recognition field. Pattern recognition is used in fields ranging from medicine to physical security. Within the context of this book, we will use it to describe red flags that are typically associated with a specific fraud scenario.

The use of frequency analysis is combining fraud theory to a number of instances that would correlate to a fraud scenario. The frequency analysis may focus on a specific number or use greater‐than or less‐than analysis. Frequency analysis may allow us to exclude a grouping from our analysis or require the inclusion of the grouping. Remember, perpetrators seldom steal once.

Frequency Analysis

Frequency analysis is the process of assigning a record count that correlates to the number of transactions that are attributable to an entity that links to a fraudulent action, in essence attributable to a fraud scenario. The frequency analysis focuses on either over or under a predetermined record count. The record count number is judgmental, based on the auditor's professional experience and the relevant fraud theory. It should be noted that the record count is not relevant to every fraud scenario. In establishing a record count, we suggest the following guidelines:

The frequency count is based on the expected occurrence rate associated with the specific fraud scenario or the fraud pattern.
Frequency analysis is typically associated with the transaction analysis versus the entity analysis.
The number of transactions associated with false entity schemes is determined by the perpetrator.
In payroll, the number of transactions is typically associated with the payroll cycle.
Record count should be based on a logical interval whenever possible—that is, daily, weekly, monthly, quarterly, semiannually, or annually.
Specific identification strategies associated with transaction analysis should be sufficient to indicate an intent factor versus a normal error rate for the population.
Internal control avoidance strategies focus on the number of records avoiding a control threshold if the scenario is a false entity scenario.
Internal control avoidance strategies for real entity schemes can be a frequency of one or more.
Data interpretation for the frequency of the pattern is a critical aspect of the selection process.
Number anomaly frequency should be sufficient to indicate an intent factor versus a normal error rate for the population.
Remember, with every guideline there is a logical exception.

Pattern Recognition

Pattern recognition is the process of searching for a predefined condition in a data set. The pattern either is implied through the four strategies or is defined as an attribute to a fraud scenario. Every data interrogation routine uses pattern recognition. The key is to know what you are looking for versus identifying a data pattern that does not correlate to a fraud scenario. There are five general categories of pattern recognition for fraud detection:

Creating groups of data for further study, often called cluster analysis. The cluster is typically associated with a business process or transaction type within a business system. Based on the first cluster, the goal is to create a micro‐cluster within the primary cluster. The goal is to identify a discrete number of transactions that conforms to the fraud scenario data profile.
Anomaly pattern is the identification of transactions that do not conform to an expected pattern or deviate from the norm. The various patterns are: nonconformance with a process or procedure; event occurs at an unusual date or time; and illogical order of events or outliers analysis based on frequency, individual amount, aggregate amount, percentage, or range.
Outlier pattern is focused on the transactions that exist outside of the bell‐shaped curve.
Entity pattern is based on all the known permutations of the shell entity, conflict of interest, and real entity structures that are associated with fraud scenarios.
Transaction pattern is a form of predictive analysis that suggests a pattern of data within transactions is more likely to be fraudulent than nonfraudulent. It is a lot like building a hypothesis and testing the hypothesis through data examination.

Correlating the pattern recognition between entity and transaction is the key to fraud detection. When the transactional data associated with the entity fit the fraud action pattern, the likelihood of fraud detection increases dramatically.

Strategies for Master File Data

The first step is to understand that there are three primary categories of entities; the applicability will vary based on the nature of your organization. In each primary category, there are different types of entities that create different fraud scenarios and require unique search routines to locate the entity. The three primary categories of entity structures are shell companies, conflict of interest, and real entities. For vendors and customers, the following provides a list of the secondary type of entities within the primary category. Employees follow a similar pattern, but due to the nature of employees, the description would be different, whereas the concept would be the same.

Through understanding the different permutations, the fraud auditors can improve their pattern recognition. The following provides a list of entity patterns associated with fraud scenarios.

Shell Company:

Stand‐alone company.
Pass‐through stand‐alone company created by an internal source.
Pass‐through stand‐alone company created by salesperson at real company.
Assuming the identity of dormant vendor.
Temporarily assuming identity of the same active vendor.
Temporarily assuming identity of the random active vendor.
Causing a real vendor not on your master file to be added to master file, in essence, assuming the identity of a real entity.
Hidden entity
1. Look‐alike name.
2. Real vendor with multivendor numbers or names.
Temporary or one‐time entity.

Conflict of Interest:

Real company has one customer.
Pass‐through stand‐alone has one customer.
Real company with hidden ownership.
Real company with multiple customers has no hidden ownership that corrupts the internal selection process.

Real Company:

Real entity is complicit in corrupting internal source.
Real entity is extorted by internal source.
Real entity operates as a pass‐through, typically in collusion with an internal source.
Real entity operates under multiple names.
Real entity operates as temporary or one‐time entity.
Real entity is not complicit.
Subcontractors to a general contractor can function in the same manner; however, since the subcontractor is not within the master file, it is beyond the scope of this book.

Guidelines in Building Data Interrogation Routines for Entity Types

The first step is to decide which primary entity category is within the audit scope.
Data interrogation routines should be designed for each secondary type of entity in the primary category.
The description of the secondary entities will vary by revenue, payroll, and expenditures.
For shell companies, the data interrogation should search on missing information, anomalies in the identifying information, and matching to other company data files. If the shell company is a vendor, then matching to human resources or customer database would prove to be useful.
For multiple shell companies operated by the same individual, search on duplicate address, bank account, telephone number, or email address.
Shell companies operating as a pass‐through shell company by an internal source are similar to shell companies for false billing schemes.
For shell companies operating as a pass‐through by a salesperson from a real company, the data interrogation should search on duplicate address, bank account, telephone number, or email address.
Conflict‐of‐interest entity with one customer has a similar analysis to the shell company.
Assume identity schemes always search on change of address or bank account. The caveat occurs when the perpetrator is able to manually control the payment for expenditures or change or delete a customer's transaction after shipment.
Real companies operating under multiple names: Search on duplicate identifying information. The permutation typically is associated with corruption schemes.
For real companies, the analysis should focus on the transactional activity that links to the fraudulent action.

Strategies for Transaction Data File

The inherent scheme has two components, the entity structure and the fraud action statement, which links to the transactional files. This section will explain our methodology for building data interrogation routines for transactional data. Chapters 7 through 15 explain how to apply the methodology to the specific core business systems. Our methodology for using transactional data needs to address the following six questions:

What data are available for the business transaction?
What patterns could occur within the specific data item?
What pattern would normally exist in the database?
What would cause a pattern to be a data anomaly versus a red flag of fraud?
Which patterns link to the fraud scenario?
How do we develop a data interrogation routine to locate the links to the fraud scenario?

What Data Are Available for the Business Transaction?

Data is data. Maybe that's a philosophic statement, but for most data items used in fraud data analytics the concept is that basic. Finding the data in your databases is a different challenge. Most transactions have a core set of data that is necessary to initiate, process, and record the transaction. There is no magic here; the starting point for transactional fraud data analytics centers on the auditor's ability to use the following information:

Control number: purchase order number, vendor invoice number, payment number, sales order number, sales invoice number, remittance number, etc.
Date of the transaction: purchase order date, vendor invoice date, etc.
Amount of the transaction: vendor invoice amount, discount amount, payment amount, etc.
Alpha description of the transaction: words used to describe what was purchased, inventory description, etc.
Numeric description of the transaction: product number, sku number, etc.
General ledger account number.

Specific data elements that relate to a specific fraud scenario are discussed in later chapters.

Depending on the business system the auditor will need to change the generic term, control number, to a business‐specific number. We have vendor invoice numbers, customer order numbers, transaction numbers; the list goes on and on. What is important is to understand is how the control number or any other data element can be used in transactional fraud analysis. To explain the methodology we will use the control number to illustrate how to answer the six questions listed earlier. The chapter will address the remaining data items by providing the necessary guidance on how to use the data element versus illustrating the application of the six questions.

What Control Number Patterns Could Occur within the Specific Data Item?

The pattern question is not a fraud question but rather a logic question: What are all the logical patterns that can exist within the specific data element? Remember, many control numbers in your database are created by another organization (vendor or customer). That organization decides on the control pattern versus your company. Using permutation analysis, a listing of logical patterns associated with a control number follows:

No number.
Sequential number.
Duplicate number.
Random ascending numbers.
Random descending numbers.
Mix of ascending and descending numbers.
All alpha or alpha right‐justified or alpha left‐justified.
Date format numbers.
Project numbers.
Project number with hyphen sequential number.
Illogical range—the pattern focuses on the beginning control number and ending control number based on the date range in the scope period. The illogical question is whether the control number range makes sense for the perceived size of the vendor or customer. The determination is subjective and based on limited range. With the right professional experience, the analysis can be very effective to detect someone who knows not to use a sequential pattern but did not think about the range of numbers over a period of time.
Hidden number is a number that is intentionally disguised to avoid data detection. The number might have a letter added to avoid duplicate number testing, it might be created by selecting a number that was not used, or it might exceed the known range or a dormant number.

Practical Suggestion

Interview the function responsible for entering the transaction and determine their administrative practices for entering documents. If there is no control number, is the field blank or is a created number entered into the field? If the document has overt data errors, what are the administrative practices? If the control number has 15 integers, are all integers entered into the field? The time allocated to the question will save an enormous amount of time resolving transactions that appear to have red flags.

What Control Number Pattern Would Normally Exist in the Database?

False positives are an interesting problem in fraud data analytics. One of the reasons for false positives is a lack of understanding of the data. There was an oil commercial years ago in which the tagline was, “You can pay me now or you can pay me later.” The tagline is on point with understanding your data before you create reports. Most data patterns that we use in fraud data analytics are either a normal business pattern for an organization or an anomaly that might indicate a fraud scenario. The most bizarre story I have seen was a company that reverted back to control number 1 when it reached control number 100,000. Confused? Well, so was I, but that is the reality of the type of patterns your analysis might find attached to an entity structure.

In the pattern‐recognition stage, we need to understand both our business practices and our suppliers' and customers' business practices. If my suppliers are large public corporations, I would expect ascending direction of invoice numbers. If my suppliers are small, nonpublic companies, I would not be surprised to have invoices with no invoice number. This step is about creating realistic expectations as to the quality of the data for data‐mining procedures. In Chapter 4, the concept of data availability, reliability, and usability will be discussed.

Typically, a sequential pattern of vendor invoice numbers would be a red flag of a shell company. However, in certain industries, the sequential pattern is a normal business pattern. In a project‐based industry, invoice numbers are sometimes a customer's project number hyphenated for the project billing cycle (i.e., 0604‐19). Assuming that the supplier submitted 19 invoices with a range of 0601‐01 through 0604‐19, then the supplier invoice pattern is a sequential pattern. However, the sequential pattern identified in this example correlates to normal business practice versus a fraud scenario. The fraud auditor needs to understand the patterns that exist in the company's database and the patterns that correlate to a fraud scenario.

So, here is the million‐dollar question: Should the data analytics be further refined to mitigate the false positive, or should the audit team resolve the false positive through performing audit procedures? This should be a definitive decision within the fraud data analytics plan. The only wrong answer is to ignore the question.

What Would Cause a Pattern to Be a Data Anomaly versus a Red Flag of Fraud?

There are four categories of data anomalies in your database that will affect the success of your fraud data analytics plan:

Data entry errors. A digit is added to a vendor invoice number, the control number is entered as 19250 versus the correct control number of 1925, and the list goes on and on. Data reliability is further discussed in Chapter 4.
Changes in society. In payroll for years, having a duplicate bank account number with two different names would be a red flag of a ghost employee, or if the area code of the telephone number did not correlate to the physical location of the business, then maybe we found a shell company. Societal changes have changed the traditional red flags.
How your customer or suppliers use the five data elements on their documents. If a vendor includes letters in its invoice number, then the fraud auditor's ability to calculate an invoice number range via a control number is impacted.
False positives. The data anomaly matches the pattern for the fraud scenario; however, the pattern has a logical business explanation.

Which Patterns Link to the Fraud Scenario?

The fraud auditor needs to understand that selecting the pattern is an educated guess; however, without selecting the pattern, you will not be able to write the program to search for a pattern that is associated with a fraud scenario. Chapters 7 through 15 will provide examples of patterns that link to specific scenarios.

Data interrogation routines are designed to search for a pattern that correlates to a fraud scenario. There are no absolutes as to which pattern correlates to which fraud scenario. However, without careful consideration, the data interrogation step will search for data anomalies caused by errors or natural anomalies that simply exist in every database.

Let's select three patterns and correlate the control number to the most likely fraud scenario:

Sequential pattern of vendor invoice numbers is useful to locate shell companies or conflict‐of‐interest companies because the sequential pattern suggests that the vendor has only one customer.
Duplicate numbers is useful in searching for a real vendor operating in collusion with an internal person who is structuring invoices to avoid control levels or to search for intentional duplicate payment schemes.
Limited range is useful in searching for pass‐through entities operated by an outside salesperson who has more than one customer participating in the scheme or a shell company where the perpetrator was sophisticated enough to avoid a sequential pattern but not sophisticated enough to make the range of numbers consistent with the perceived size of the company. Pass‐through schemes are discussed in detail in Chapter 7.

How Do We Develop a Data Interrogation Routine to Locate the Links to the Fraud Scenario?

In this step, the fraud auditor becomes the programmer. Using the available software routines, the fraud auditor develops reports to identify the noted pattern for either audit examination or further refined data interrogation routines. This step requires both skill and imagination. The skill is associated with the use of the software. The imagination is associated with finding creative ways of working around the data issues that will exist in every database.

Practical Guidance on How to Use the Remaining Data Elements

The Date Field

As with the control number, the starting point is to identify the date patterns based on logic versus fraud. The logical date patterns are:

Blank or no date
Duplicate date: exact, close, or related
Document date
Transaction date
System‐generated date
Creation date
Termination date
Reactivate date
Change date
Recorded date
Date anomalies

The date field allows the auditor to develop a timeline for a business transaction. We can use this timeline to search for speed of transaction, circumvention of an internal control, off‐period transaction, or an illogical sequence of events. The following illustrates the concept:

Speed of transaction is always a critical test because it indicates someone is taking a special interest in the business transaction. The starting point is to understand the normal processing time for a business transaction. If your company policy is to pay vendor invoices in 60 days, then why are the Fraud Auditing Corporation invoices paid in 3 days?
Circumvention of internal controls should always be a red flag for the fraud auditor. In payroll, there is an automated system that calculates payroll on a periodic basis. Knowing those dates, the fraud auditor would search for payroll payments that do not coincide with the automated dates, thereby identifying all manual checks. The last phase is summarizing the number of manual checks by employee number, identifying all employees receiving more than one manual check. In one investigation, the controller was receiving his biweekly payroll check from the automated payroll system. Then in the off‐week, he would issue himself a manual check and record the check to a professional services account versus a payroll account. A second test in payroll using the date field is comparing the dates on manual checks to termination dates. In this scenario, the controller was diverting the final manual check.
Off‐period analysis is identifying the date and time a transaction was recorded in the business system. Obviously, off‐period analysis is dependent on the organization. The anomaly occurs when the transaction is recorded after hours, on a holiday, or at the end of a reporting period. In one case involving the theft of inventory through the revenue system, the sales transactions were consistently voided after the goods left the warehouse on a Sunday morning. Combining the use of frequency analysis with the off‐period analysis, pattern and frequency of voiding sales would have been highlighted and the fraud scenario discovered.
Illogical sequence is identifying transactions where the date order of the transactions does not follow a normal transaction. In one sense, the test is similar to the speed of transactions and circumvention of internal controls, but the difference is in the intent of the illogical sequence of the events. In a sales transaction, if the ship date is before the customer order date, the illogical order of the transactions may indicate improper recognition of revenue.

The transaction type will determine how the date is used in the fraud data analysis. The same six questions listed earlier need to be asked and answered as part of building the fraud data interrogation routine for the selected transaction.

Illustrative Examples of Using the Date Field:

Asset misappropriation—the invoice date and the payment date are equal. Illustrates the speed of transaction.
Internal control circumvention—the invoice date is before the purchase order date.
In corruption schemes, the invoice date is before the purchase order date, which indicates that the acquisition most likely circumvented the purchasing process.
In financial reporting, the receiving report date is after the inventory posting date, indicating an overstatement of inventory.

The Amount Field

There are a lot of theories and fraud stories about amounts. The most common story is that a fraud scenario starts out small and gets bigger over time. Or that fraud amounts will typically correlate to the upper spheres of control levels. While these have some semblance of truth, they also can be misleading as it relates to fraud data analysis.

I believe that the amount of a fraudulent transaction or the aggregate amount of the fraud is far more complex than the interesting stories. I think there is a direct correlation to a person's risk tolerance as related to the person's perception of detection, pressures facing the individual, and at what stage the person believes there is a higher likelihood of detection. Most of this requires the auditor to know the perpetrator on a personal basis. As a reminder, the focus of this book is searching for fraud without predication or a target.

The dollar amount of the transaction is what fraud is all about. How much did the person steal? How much was revenue overstated? There are nine logical amount patterns:

A single amount is below or above a control threshold.
Duplicate amount, linking to control number, date, or description.
Two or more in aggregate exceed the control threshold, which then links to a specific to date or control number.
Even amount.
Odd amount.
Recurring amount.
Amount embedded with a lucky number. (Yes, in one case, the perpetrator embedded his high school football jersey number in the amount.)
Contra amount is a negative or positive, which is inconsistent with the expected value.
Aggregate amount.

The Description Field: Alpha and Numeric Considerations

The alpha and numeric field is critical for both false entity and real entity analysis. In false entity schemes we search for alpha or numeric descriptions that are not consistent with our business expectation. In real entity analysis, we use the description field to search for overbilling and corruption schemes. There are nine logical patterns:

Tangible good descriptions generally contain an alpha description and numeric description referred to as the product number.
Illogical product description. A product number that has an insufficient number of numeric positions as related to industry standard or no alpha description.
Service transactions tend to be an alpha description, but may refer to a contract number.
Duplicate descriptions.
Duplicate product numbers.
Missing alpha description.
Missing product number.
Grade level in payroll.
Grade title in payroll.

General Ledger Account Numbers

The general ledger account number is what I call the home of the fraud transaction. The home is going to have a division number, department number, and expense code. For the budget owner scenarios, the fraudulent transactions are recorded in an account the budget owner controls and monitors. By contrast, the direct input function needs to find a home to record the fraudulent transactions. The home for the direct input function needs to be recorded where the transaction will avoid scrutiny by the budget owner. The senior manager has many homes for the fraudulent transaction. Understanding the general ledger numbering system is important in fraud data analysis. The primary categories are as follows:

Financial reporting: The entire fraud data analytics is centered on the transactions recorded in a general ledger account.
Asset misappropriation: The beginning analysis is not dependent on the general ledger account number until the fraud auditor starts to find suspicious transactions.
Corruption: In favoritism analysis or targeted expenditure analysis, the general ledger account number is a key element of the fraud data analytics.

In addition to the data created from business transactions, data systems also create logs of when transactions were created, changed, or deleted. The use of logging information follows the same guidelines as information from documents:

Control number
Date of transaction
Time of transaction
User ID creating transaction
IP address

Practical Guidance for Master File Data and the Transaction Data

As a matter of style, the last step before beginning the fraud data analytics is the process of linking the transactional data to the master file data. When a pattern is identified in the transactional data, the next logical step is to link the transactions to the entity data file. Therefore, as a practical suggestion, link the entity file to the transactional file at the beginning of analyzing transactions versus after transaction analysis. Obviously, the size of files may make this process not feasible.

Practical Guidance for Transactional Data Associated with a False Entity

Transactional data associated with a false entity is different in many ways from transactional data with real companies. The perpetrator is creating the documents; he is creating the control number, date, amount, and description, and deciding where to record the fraudulent transaction. The transaction in some way is a reflection of the perpetrator's personality. (No, I have no intention of incorporating behavioral analysis into my fraud data analytics.)

The key is to recognize that the control data associated with a false entity is less likely to reflect being created by a real business and more likely that a person created the false data. The control number may be sequential because the small business software simply increments the control number after creating the next document. All the amounts may be even. The date field may be all Saturdays because the perpetrator is creating the document on the weekend. The same error may exist in multiple documents for different entities. The description field on a created vendor invoice most likely will not reflect the product description of a real wholesale company in the business of selling widgets. The widget product may only have four integers, which is not consistent with a large company selling millions of items. As a general guideline, for false entity schemes, search for the absence of what would normally exist for a real company.

Illustrative Example of Transactional Data and False Entity

In one fraud data analytics project, an internal manager created three shell companies. The companies all had different addresses, in different states, and different telephone numbers. The only common pattern in the master file was that all three companies were created in the database on the same day, but at different times of the day. However, in the transactional file, the vendor invoices were all the same amount, the same date, and the same description, and the control numbers for all three companies had the same pattern: 0001, 020506, 0003, 071806, 0005, and 0006. Most likely, the pattern was intended to be a sequential pattern, but note that the same data error occurs in the control number for all three companies. If you believe in coincidences, then pass on further review. If, however, you do not believe in coincidences, then start your fraud investigation, because you have just found three shell companies that all link to the same individual because of the general ledger account.

Practical Guidance for Transactional Data Associated with the Real Entity

In real entity scenarios, analyzing the entity data for the most part is an obvious waste of time. Yes, there are exceptions to all rules. I learned that in third grade. The obvious exception is the hidden entity scheme. Therefore, we need to build the fraud theories around whether the real entity is committing the fraud scenario alone or the real entity is in collusion with an internal source or external source. Transactional data becomes the focus of real company analysis:

An external entity with no internal collusion submits transactional data that either exploit the vulnerabilities within the internal controls or create the appearance of conforming to internal controls, but the volume or frequency exceeds business norms. To illustrate the concept, many payments systems will automatically pay the invoice if the invoice is below a certain dollar amount, referred to as a small‐dollar invoice. The dishonest supplier continually submits small‐dollar invoices, knowing that accounts payable will pay the invoice. The specific identification strategy or internal control avoidance is generally the right strategy.
An external entity in collusion with an internal source means the transaction generally conforms to internal documents; therefore, exception analysis generally will not detect this permutation. The use of data interpretation strategy as the prime strategy, coupled with a secondary strategy, is the most effective approach. The secondary strategy might be outlier analysis, historical change analysis, timing analysis, and targeted analysis. The previous example would occur in the same manner, but the internal budget owner is approving the small‐dollar invoice and the budget owner is receiving a kickback.

Summary

This chapter is about the science of fraud data analytics. It is a systematic study of fraud scenarios and fraud scenarios' relationship to data. Like all scientific principles, the continual study of the science and the practical application of the science are both necessary for the success of the fraud data analytics journey in the discovery of fraud scenarios that are hiding in core business systems. As stated in Chapter 1, fraud data analytics is both a science and an art.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 3: Data Analytics Strategies for Fraud Detection

Create new playlist

Sign In

Sign Up

Understanding How Fraud Concealment Affects Your Data Analytics Plan

Low Sophistication

Medium Sophistication

High Sophistication

Shrinking the Population through the Sophistication Factor

Building the Fraud Scenario Data Profile

Precision of Matching Concept on Red Flags

Fraud Data Analytic Strategies

Specific Identification of a Data Element or an Internal Control Anomaly

Illustrative Examples of Specific Identification Strategy:

Guidelines for Use of Specific Identification Strategy:

Consider the Following Scenario

Internal Control Avoidance

The Fundamental Strategies for Internal Control Avoidance

Illustrative Examples of Internal Control Avoidance

Guidelines for Use of Internal Control Avoidance Strategy

Consider the Following Scenario

Data Interpretation Strategy

Guidelines for Use of Data Interpretation

Consider the Following Scenario

Basis for Exclusion:

Basis for Selection:

Number Anomaly Strategy

Guidelines for Using the Number Anomaly Strategy

Consider the Following Scenario

Pattern Recognition and Frequency Analysis

Frequency Analysis

Pattern Recognition

Strategies for Master File Data

Shell Company:

Conflict of Interest:

Real Company:

Guidelines in Building Data Interrogation Routines for Entity Types

Strategies for Transaction Data File

What Data Are Available for the Business Transaction?

What Control Number Patterns Could Occur within the Specific Data Item?

Practical Suggestion

What Control Number Pattern Would Normally Exist in the Database?

What Would Cause a Pattern to Be a Data Anomaly versus a Red Flag of Fraud?

Which Patterns Link to the Fraud Scenario?

How Do We Develop a Data Interrogation Routine to Locate the Links to the Fraud Scenario?

Practical Guidance on How to Use the Remaining Data Elements

The Date Field

The Amount Field

The Description Field: Alpha and Numeric Considerations

General Ledger Account Numbers

Practical Guidance for Master File Data and the Transaction Data

Practical Guidance for Transactional Data Associated with a False Entity

Illustrative Example of Transactional Data and False Entity

Practical Guidance for Transactional Data Associated with the Real Entity

Summary

Table of Contents for
Chapter 3: Data Analytics Strategies for Fraud Detection