Chapter 8
Referential Integrity Refactorings

Referential integrity refactorings are changes that ensure that a referenced row exists within another table and/or that ensures that a row that is no longer needed is removed appropriately. The referential integrity refactorings are as follows:

• Add Foreign Key Constraint

• Add Trigger For Calculated Column

• Drop Foreign Key Constraint

• Introduce Cascading Delete

• Introduce Hard Delete

• Introduce Soft Delete

• Introduce Trigger For History

Add Foreign Key Constraint

Add a foreign key constraint to an existing table to enforce a relationship to another table.

Motivation

The primary reason to apply Add Foreign Key Constraint is to enforce data dependencies at the database level, ensuring that the database enforces some referential integrity (RI) business rules preventing invalid data from being persisted. This is particularly important when several applications access the same database because you cannot count on them enforcing data-integrity rules consistently.

Potential Tradeoffs

Foreign key constraints reduce performance within your database because the existence of the row in the foreign table will be verified whenever the source row is updated. Furthermore, when adding a foreign key constraint to the database, you must be careful about the order of inserts, updates, and deletions. For example, in Figure 8.1, you cannot add a row in Account without the corresponding row in AccountStatus. The implication is that your application, or your persistence layer as the case may be, must be aware of the table dependencies in the database. Luckily, many databases allow commit-time enforcing of the database constraints, enabling you to insert/update or delete rows in any order as long as data integrity is maintained at commit time. This type of feature makes development easy and provides higher incentive to use foreign key constraints.

Figure 8.1. Adding a foreign key constraint to the Account.AccountStatus column.

image

Schema Update Mechanics

As depicted in Figure 8.1, to update the schema to add a foreign key constraint, you must do the following:

1. Choose a constraint checking strategy. Depending on your database product, it will support one or two ways to enforce foreign keys. First, with the immediate checking method, the foreign key constraint is checked when the data is inserted/updated or deleted from the table. The immediate checking method is better at failing faster and will force you to consider the order of database changes (inserts, updates, and deletes). Second, with the deferred checking method, the foreign key constraint is checked when the transaction is committed from the application. This method allows you the flexibility to not worry about the order of database changes because the constraints are checked at commit time. This approach also enables you to cache all the dirty objects and write them in a batch; you just have to make sure that at commit time the database is in a clean state. In either case, the database returns an exception when the first foreign key constraint fails. (There could be several.)

2. Create the foreign key constraint. Create the foreign key constraint in the database via the ADD CONSTRAINT clause of the ALTER TABLE statement. The database constraint should be named according to your database naming conventions for clarity and effective error reporting by the database. If you are using the commit-time checking of constraints, there may be performance degradation because the database will be checking the integrity of the data at commit time, a significant problem with tables with millions of rows.

3. Introduce an index for the primary key of the foreign table (optional). Databases use select statements on the referenced tables to verify whether the data being entered in the child table is valid. If the AccountStatus.StatusCode column does not have an index, you may experience significant performance degradation and need to consider applying the Introduce Index (page 248) database refactoring. When you create an index, you will increase the performance of constraint checking, but you will be decreasing the performance of update, insert, and delete on AccountStatus because the database now has to maintain the added index.

The following code depicts the steps to add a foreign key constraint to the table. In this example, we are creating the constraint such that the foreign key constraint is checked immediately upon data modification:

ALTER TABLE Account
  ADD CONSTRAINT FK_Account_AccountStatus
  FOREIGN KEY (StatusCode)
  REFERENCES AccountStatus;


In this example, we are creating the foreign key constraint such that the foreign key is checked at commit time:

ALTER TABLE Account
   ADD CONSTRAINT FK_Account_AccountStatus
   FOREIGN KEY (StatusCode)
   REFERENCES AccountStatus
   INITIALLY DEFERRED;


Data-Migration Mechanics

To support the addition of a foreign key constraint to a table, you may discover that you need to update the existing data within the database first. This is a multi-step process:

1. Ensure the referenced data exists. First, we need to ensure that the rows being referred to exist in AccountStatus. You need to analyze the existing data in both Account and AccountStatus to determine whether there are missing rows in AccountStatus. The easiest way to do this is to compare the count of the number of rows in Account to the count of the number of rows resulting in the join of Account and AccountStatus.

2. Ensure that the foreign table contains all required rows. If the counts are different, either you are missing rows in AccountStatus and/or there are incorrect values in Account.StatusCode. First, create a list of unique values of Account.StatusCode and compare it with the list of values from AccountStatus.StatusCode. If the first list contains values that are valid but do not appear in the second list, AccountStatus needs to be updated. Second, there may still be valid values that appear in neither list but are still valid within your business domain. To identify these values, you need to work closely with your project stakeholders; better yet, just wait until they actually need the data rows and then add them at that time.

3. Ensure that source table’s foreign key column contains valid values. Update the lists from the previous step. Any differences in the list must now be the result of invalid or missing values in the Account.StatusCode column. You need to update these rows appropriately, either with an automated script that sets a default value or by manually updating them.

4. Introduce a default value for the foreign key column. You may optionally need to make the database insert default values when the external programs do not provide a value for Account.StatusCode. See the database refactoring Introduce Default Value (page 186).

For the example of Figure 8.1, you must ensure that the data before the foreign key constraint is added is clean; if it is not, you must update the data. Let’s assume that we have some Account rows where the status is not set or is not part of the AccountStatus table. In this situation, you must update the Account.Status column to some known value that exists in the AccountStatus table:

UPDATE Account SET Status = 'DORMANT'
  WHERE
    Status NOT IN (SELECT StatusCode FROM AccountStatus)
    AND Status IS NOT NULL;


In other cases, you may have the Account.Status containing a null value. If so, update the Account.Status column with a known value, as shown here:

UPDATE Account SET Status = 'NEW'
  WHERE Status IS NULL;


Access Program Update Mechanics

You must identify and then update any external programs that modify data in the table where the foreign key constraint was added. Issues to look for include the following:

1. Similar RI code. Some external programs will implement the RI business rule that will now be handled via the foreign key constraint within the database. This code should be removed.

2. Different RI code. Some external programs will include code that enforces different RI business rules than what you are about to implement. This implication is that you either need to reconsider adding this foreign key constraint because there is no consensus within your organization regarding the business rule that it implements or you need to rework the code to work based on this new version (from its point of view) of the business rule.

3. Nonexistent RI code. Some external programs will not even be aware of the RI business rule pertaining to these data tables.

All external programs must be updated to handle any exception(s) thrown by the database as the result of the new foreign key constraint. The following code shows how the application code needs to change to handle exceptions throws by the database:

// Before code
stmt = conn.prepare(
  "INSERT INTO Account(AccountID,StatusCode,Balance) "+
  "VALUES (?,?,?)";
stmt.setLong(1,accountId);
stmt.setString(2,statusCode);
stmt.setBigDecimal(3,balance);
stmt.executeUpdate();

//After code
stmt = conn.prepare(
  "INSERT INTO Account(AccountID,StatusCode,Balance) "+
  "VALUES (?,?,?)";
stmt.setLong(1,accountId);
stmt.setString(2,statusCode);
stmt.setBigDecimal(3,balance);
try {
  stmt.executeUpdate();
} catch (SQLException exception){
      int errorCode = exception.getErrorCode();
      if (errorCode = 2291) {
         handleParentRecordNotFoundError();
      }
      if (errorCode = 2292) {
          handleParentDeletedWhenChildFoundError();
      }
}


Add Trigger For Calculated Column

Introduce a new trigger to update the value contained in a calculated column. The calculated column may have been previously introduced by the Introduce Calculated Column refactoring (page 81).

Motivation

The primary reason you would apply Add Trigger For Calculated Column is to ensure that the value contained in the column is updated properly whenever the source data changes. This should be done by the database so that all the applications are not required to do so.

Potential Tradeoffs

When a calculated column is based on complex algorithms, or simply on data located in several places, your trigger will implement a lot of business logic. This may lead to inconsistency with similar business logic implemented within your applications.

The source data used by the trigger might be updated within the scope of a transaction. If the trigger fails, the transaction fails, too, causing it to be rolled back. This will likely be perceived as an undesirable side effect by external programs.

When a calculated column is based on data from the same table, it may not be possible to use a trigger to do the update because many database products do not allow this.

Schema Update Mechanics

Applying Add Trigger For Calculated Column can be complicated because of data dependencies of the calculated column. You will need to do the following:

1. Determine whether triggers can be used to update the calculated column. In Figure 8.2, the TotalPortfolioValue column is calculated. You know that because of the forward slash (/) in front of the name, a UML convention. When TotalPortfolioValue and the source data is in the same table, you likely cannot use triggers to update the data value.

Figure 8.2. Adding a trigger to calculate Customer.TotalPortfolioValue.

image

2. Identify the source data. You have to identify the source data, and how it should be used, to determine the value of the calculated column.

3. Identify the table to contain the column. You have to identify which table should include the calculated column if it does not already exist. To do this, ask yourself which business entity the calculated column describes best. For example, a customer’s credit risk indicator is most applicable to the Customer entity.

4. Add the column. If the column does not exist, add it via the ALTER TABLE ADD COLUMN command. Use Introduce New Column (page 301).

5. Add the trigger(s). You need to add a trigger to each table that contains source data pertinent to calculating the value. In this case, the source data for TotalPortfolioValue exists in the Account and InsurancePolicy tables. Therefore, we need a trigger for each table, UpdateCustomerTotalPortfolioValue and UpdateTotalPortfolioValue, respectively.

The following code shows you how to add the two triggers:

—Update the TotalPortfolioValue with the trigger.
CREATE OR REPLACE TRIGGER
UpdateCustomerTotalPortfolioValue
  AFTER UPDATE OR INSERT OR DELETE
  ON Account
  REFERENCING OLD AS OLD NEW AS NEW
  FOR EACH ROW
  DECLARE
    BEGIN
      UpdateCustomerWithPortfolioValue;
    END;
  END;
/

CREATE OR REPLACE TRIGGER
UpdateCustomerTotalPortfolioValue
  AFTER UPDATE OR INSERT OR DELETE
  ON InsurancePolicy
  REFERENCING OLD AS OLD NEW AS NEW
  FOR EACH ROW
  DECLARE
    BEGIN
      UpdateCustomerWithPortfolioValue;
     END;
  END;
/


Data-Migration Mechanics

There is no data to be migrated per se, although the value of Customer.TotalPortfolioValue must be populated based on the calculation. This is typically done once in batch via one or more scripts. In our example, we have to update the Customer.TotalPortfolioValue for all the existing rows in the Customer table with the sum of Account.Balance and Policy.Value for each customer:

UPDATE Customer SET TotalPortfolioValue =
  (SELECT SUM(Account.Balance) + SUM(Policy.Balance)
    FROM Account, CustomerInsurancePolicy,
    InsurancePolicy
    WHERE
    Account.AccountID =
    CustomerInsurancePolicy.AccountID
    AND CustomerInsurancePolicy.PolicyID=Policy.PolicyID
    AND Account.CustomerID=Customer.CustomerID
  );


Access Program Update Mechanics

You need to identify all the places in external applications where this calculation is currently performed and then rework that code to work with TotalPortfolioValue—this typically entails removing the calculation code and replacing it with a read to the database. You may discover that the calculation is performed differently in various applications, either because of a bug or because of a different situation, and need to negotiate the appropriate calculation with the business.

Drop Foreign Key Constraint

Remove a foreign key constraint from an existing table so that a relationship to another table is no longer enforced by the database.

Motivation

The primary reason to apply Drop Foreign Key Constraint is to no longer enforce data dependencies at the database level—instead, data integrity is enforced by external applications. This is particularly important when the performance cost of enforcing RI by the database cannot be sustained by the database anymore and/or when the RI rules vary between applications.

Potential Tradeoffs

The fundamental tradeoff is performance versus quality: Foreign key constraints ensure the validity of the data at the database level at the cost of the constraint being enforced each time the source data is updated. When you apply Drop Foreign Key, your applications will be at risk of introducing invalid data if they do not validate the data before writing to the database.

Schema Update Mechanics

To drop a foreign key constraint, you must either apply the ALTER TABLE DROP CONSTRAINT command or you can just apply the ALTER TABLE DISABLE CONSTRAINT command. The advantage of that latter approach is that it ensures that the relationship is documented even though the constraint is not enforced. The following code depicts the two ways by which you can drop the foreign key constraint between Account.StatusCode and AccountStatus.StatusCode in Figure 8.3. The first way just drops the constraint; the other one disables it and thereby documents the need for it. We recommend the second approach:

Figure 8.3. Dropping a foreign key constraint from the Account table.

image

ALTER TABLE Account DROP CONSTRAINT FK_Account_Status;
ALTER TABLE Account DISABLE CONSTRAINT FK_Account_Status;


Data-Migration Mechanics

There is no data to migrate for this database refactoring.

Access Program Update Mechanics

You must identify and then update any external programs that modify the data column(s) on which the foreign key constraint was defined. There are two issues to look for in the external programs. First, individual applications need to be updated to ensure that appropriate RI rules are still being enforced. The rules could vary, but in general, you need to add logic to each application to ensure that a row exists in AccountStatus whenever it is referenced in Account. The implication is that you will either need to determine that the row already exists—if AccountStatus is small, you might consider caching it in memory—and be prepared to add a new row to the table. The second issue concerns exception handling. Because the database will no longer throw RI exceptions pertaining to this foreign key constraint, you will need to rework any external program code appropriately.

Introduce Cascading Delete

The database automatically deletes the appropriate “child records” when a “parent record” is deleted.

Figure 8.4. Introducing a cascading delete on Policy.

image

Note that an alternative to deleting child records is just to remove the reference to the parent record within the children records. This option can only be used when the foreign key column(s) in the child tables allows null values, although this alternative may lead to lots of orphaned rows.

Motivation

The primary reason you would apply Introduce Cascading Delete is to preserve the referential integrity of your data by ensuring that related rows are appropriately deleted when a parent row is deleted.

Potential Tradeoffs

There are three potential tradeoffs with this refactoring:

Deadlock. When you implement cascading deletes, you must avoid cyclic dependencies; otherwise, deadlock may occur. Modern databases detect deadlocks and will roll back the transaction that caused the deadlock.

Accidental mass deletion. You need to be very careful when applying this refactoring. For example, it may theoretically make sense that if you delete a row in your CorporateDivision, then all rows corresponding to that division in your Employee table should also be deleted. However, you could easily delete thousands of employee records when someone inadvertently deletes a record representing a large division.

Duplicated functionality. Persistence frameworks such as Hibernate (www.hibernate.org) and Oracle’s Toplink (www.oracle.com) automate relationship management between objects; therefore, you may be duplicating functionality when you apply this refactoring.

Schema Update Mechanics

To apply Introduce Cascading Delete, you need to do the following:

1. Identify what is to be deleted. You need to identify the “children” of a row that should be deleted when the parent row is deleted. For example, when you delete an order, you should also delete all the order items associated with that order. This activity is recursive; the child rows in turn may have children that also need to be deleted, motivating you to apply Introduce Cascading Delete for them, too. We do not recommend applying this refactoring all at once; instead, it is better to apply this refactoring to one set of tables at a time, fully implementing and testing it before moving on to the next. Small changes are easier to implement than large changes.

2. Choose cascading mechanism. You can implement cascading deletes either with triggers or with referential integrity constraints with the DELETE CASCADE option. (Not all database vendors may support this option.)

3. Implement the cascading delete. With the first approach, you write a trigger that deletes all the children of the parent row when it is deleted. This option is best suited when you want to have fine-grained control over what gets deleted when the parent row is deleted. The downside is that you must write code to implement this functionality. You may also introduce deadlock situations when you have not thought through the interrelationships between multiple triggers being executed at the same time. With the second approach, you define RI constraints with the DELETE CASCADE option turned on, of the ALTER TABLE MODIFY CONSTRAINT SQL command. When you choose this option, you must define referential constraints on the database, which could be a huge task if you do not already have referential constraints defined (because you would need to apply the Add Foreign Key refactoring [page 204] to pretty much all the relationships in the database). The primary advantage with this option is that you do not have to write any new code because the database automatically takes care of deleting the child rows. The challenge with this approach is that it can be very hard to debug.

Figure 8.4 depicts how to Introduce Cascade Delete to the Policy table using the trigger method. The DeletePolicy trigger, the code is shown below, deletes any rows from the PolicyNotes or Claim tables that are related to the row in the Policy table that is being deleted:

—Create trigger to delete the PolicyNotes and Claim.
CREATE OR REPLACE TRIGGER DeletePolicy
AFTER DELETE ON Policy
FOR EACH ROW
DECLARE
BEGIN
  DeletePolicyNotes();
  DeletePolicyClaims();
END;
/


The following code shows you how to implement Introduce Cascade Delete using RI constraints with the DELETE CASCADE option:

ALTER TABLE POLICYNOTES ADD CONSTRAINT
FK_DELETEPOLICYNOTES
FOREIGN KEY (POLICYID)
REFERENCES POLICY (POLICYID)
ON DELETE CASCADE
ENABLE
;

ALTER TABLE CLAIMS ADD CONSTRAINT FK_DELETEPOLICYCLAIM
FOREIGN KEY (POLICYID)
REFERENCES POLICY (POLICYID)
ON DELETE CASCADE
ENABLE
;


Data-Migration Mechanics

There is no data to migrate with this database refactoring.

Access Program Update Mechanics

When you apply this refactoring, you must remove any application code that currently implements the delete children functionality. One challenge will be that some applications implement the deletion whereas others do not—perhaps one application deletes OrderItem rows when the corresponding Order row is deleted, but another application does not. Implementing the cascading delete within the database may inadvertently impact, perhaps for the better, the second application. The point is that you need to be very careful; you should not assume that all applications implement the same RI rules, regardless of how “obvious” it is to you that they should.

You will also need to handle any new errors returned by the database when the cascading delete does not work. The following code shows how you would change your application code before and after Introduce Cascade Delete is applied:

//Before code
private void deletePolicy (Policy policyToDelete) {
  Iterator policyNotes =
  policyToDelete.getPolicyNotes().iterator();
  for (Iterator iterator = policyNotes;
  iterator.hasNext();)
{
    PolicyNote policyNote = (PolicyNote) iterator.next();
    DB.remove(policyNote);
}
  DB.remove(policyToDelete);
}

//After code
private void deletePolicy (Policy policyToDelete) {
DB.remove(policyToDelete);
}


If you are using any of the O-R mapping tools, you will have to change the mapping file so that you can specify the cascade option in the mapping, as shown here:

//After mapping
<hibernate-mapping>
  <class name="Policy" table="POLICY">
  ......
  <one-to-many name="policyNotes"
               class="PolicyNotes"
               cascade="all-delete-orphan"
  />
  ......
  </class>
</hibernate-mapping>


Introduce Hard Delete

Remove an existing column that indicates that a row has been deleted (this is called a soft delete or logical delete) and instead actually delete the row from the application (for example, do a hard delete). This refactoring is the opposite of Introduce Soft Delete (page 222).

Motivation

The primary reason to apply Introduce Hard Delete is to reduce the size of your tables, resulting in better performance and simpler querying because you no longer have to check to see whether a row is marked as deleted.

Potential Tradeoffs

The only disadvantage of this refactoring is the loss of historical data, although you can use Introduce Trigger For History (page 227) if required.

Schema Update Mechanics

As Figure 8.5 implies, to apply Introduce Hard Delete you first need to remove the identifying column. You must remove the deletion indicator column—see the Drop Column refactoring (page 72)—which, in this case, is the Customer.isDeleted column. Next you remove any code, usually trigger code (although sometimes application code), that updates the Customer.isDeleted column. There may be code that sets the initial value to FALSE in the case of a Boolean indicator, or to a predetermined date in the case of a date/timestamp. This logic is typically implemented with a default value column constraint that would be automatically dropped along with the column. There may also be trigger code that sets the value to TRUE, or to the current date/time, when there is an attempt to delete the row. Most likely, you will only need to drop the trigger. The following code shows you how to remove the Customer.isDeleted column:

ALTER TABLE Customer DROP COLUMN isDeleted;


Figure 8.5. Introducing a hard delete for Customer.

image

Data-Migration Mechanics

You have to delete all the data rows in the Customer where isDeleted is set to TRUE, because these are the rows that have been logically deleted. Before you delete these rows, you need to update, and possibly delete, all the data that references the logically deleted data. This is typically done once in batch via one or more scripts. You should also consider archiving all the rows marked for deletion so that you can back out of this refactoring if need be. The following code shows you how to delete the rows from Customer table where the Customer.isDeleted flag is set to TRUE:

—Delete customers marked for delete
DELETE FROM Customer WHERE isDeleted = TRUE;


Access Program Update Mechanics

When you apply the Introduce Hard Delete refactoring, you must change external programs accessing the data in two ways. First, SELECT statements must not reference the Customer.isDeleted column. Second, all logical deletion code must be updated:

//Before code
public void customerDelete(Long customerIdToDelete)
throws Exception {
 PreparedStatement stmt = null;

 try {
   stmt =  DB.prepare("UPDATE Customer "+
             "SET isDeleted = ? "+
             "WHERE CustomerID =  ?");
   stmt.setLong(1, Boolean.TRUE);
   stmt.setLong(2, customerIdToDelete);
   stmt.execute();
 } catch (SQLException SQLexc){
   DB.HandleDBException(SQLexc);
 }
 finally {DB.cleanup(stmt);}
}

//After code
public void customerDelete(Long customerIdToDelete)
  throws Exception {
  PreparedStatement stmt = null;
  try {
    stmt =  DB.prepare("DELETE FROM Customer "+
                        "WHERE CustomerID =  ?");
    stmt.setLong(1, customerIdToDelete);
    stmt.execute();
  } catch (SQLException SQLexc){
    DB.HandleDBException(SQLexc);
  }
  finally {DB.cleanup(stmt);}
}


Introduce Soft Delete

Introduce a flag to an existing table that indicates that a row has been deleted (this is called a soft/logical delete) instead of actually deleting the row (a hard delete). This refactoring is the opposite of Introduce Hard Delete (page 219).

Motivation

The primary reason to apply Introduce Soft Delete is to preserve all application data, typically for historical means.

Potential Tradeoffs

Performance is potentially impacted for two reasons. First, the database must store all the rows that have been marked as deleted. This could lead to significantly more disk space usage and reduced query performance. Second, applications must now do the additional work of distinguishing between deleted and nondeleted rows, decreasing performance while potentially increasing code complexity.

Schema Update Mechanics

As Figure 8.6 implies, to apply Introduce Soft Delete, you will need to do the following:

1. Introduce the identifying column. You must introduce a new column to Customer—see the Introduce New Column transformation (page 301)—that marks the row as deleted or not. This column is usually either a Boolean field that contains the value TRUE when the row is deleted and FALSE otherwise or a date/timestamp indicating when the row was deleted. In our example, we are introducing the Boolean column isDeleted. This column should not allow NULL values. (See the Make Column Non-Nullable refactoring on page 189.)

2. Determine how to update the flag. The Customer.isDeleted column can be updated either by your application(s) or within the database using triggers. We prefer the trigger-based approach because it is simple and it avoids the risk that the applications will not update the column properly.

3. Develop deletion code. The code must be written and tested to update the deletion indicator column appropriately upon “deletion” of a row. In the case of a Boolean column set to the value to TRUE, in the case of a date/timestamp, set it to the current date and time.

4. Develop insertion code. You have to set the deletion indicator column appropriately upon an insert, FALSE in the case of a Boolean column and a predetermined date (for example, the January 1, 5000) for a date/timestamp. This could be easily implemented by using the Introduce Default Value refactoring (page 186) or via a trigger.

Figure 8.6. Introducing a soft delete to Customer.

image

The following code shows you how to add the Customer.isDeleted column and assign it a default value of FALSE:

ALTER TABLE Customer ADD isDeleted BOOLEAN;
ALTER TABLE Customer MODIFY isDeleted DEFAULT FALSE;


The following code shows you how to create a trigger that intercepts the DELETE SQL command and assigns the Customer.isDeleted flag to TRUE. The code copies the data row before deletion, updates the deletion indicator, and then inserts the row back into the table after the original is removed:

––Create a array to store the deleted Customers
CREATE OR REPLACE PACKAGE SoftDeleteCustomerPKG

AS
  TYPE ARRAY IS TABLE OF Customer%ROWTYPE INDEX BY
  BINARY_INTEGER;
  oldvals  ARRAY;
  empty    ARRAY;
END;
/

––Initialize the array
CREATE OR REPLACE TRIGGER SoftDeleteCustomerBefore
BEFORE DELETE ON Customer
BEGIN
  SoftDeleteCustomerPKG.oldvals :=
  SoftDeleteCustomerPKG.empty;
END;
/

––Capture the deleted rows
CREATE OR REPLACE TRIGGER
SoftDeleteCustomerStore
BEFORE DELETE ON Customer
FOR EACH ROW
DECLARE
  i NUMBER DEFAULT SoftDeleteCustomerPKG.oldvals.COUNT+1;
BEGIN
  SoftDeleteCustomerPKG.oldvals(i).CustomerID :=
  :old.CustomerID;
  deleteCustomer.oldvals(i).Name := :old.Name;
  deleteCustomer.oldvals(i).PhoneNumber :=
  :old.PhoneNumber;
END;
/

––Insert the customers back with the isdeleted flag set to true.
CREATE OR REPLACE TRIGGER SoftDeleteCustomerAdd
AFTER DELETE ON Customer
DECLARE
BEGIN
  FOR i IN 1 .. SoftDeleteCustomerPKG.oldvals.COUNT LOOP
    insert into Customer(CustomerID,Name,PhoneNumber,isDeleted)
    values( deleteCustomer.oldvals(i).CustomerID,
      deleteCustomer.oldvals(i).Name,
      deleteCustomer.oldvals(i).PhoneNumber,
      TRUE);
  END LOOP;
END;
/


Data-Migration Mechanics

There is no data to be migrated per se, although the value of Customer.isDeleted must be set to the appropriate default value within all rows when this refactoring is implemented. This is typically done once in batch via one or more scripts.

Access Program Update Mechanics

When you apply the Introduce Soft Delete refactoring, you must change external programs accessing the data. First, you must change read queries now to ensure that data read from the database has not been marked as deleted. Applications must add a WHERE clause to all SELECT queries, such as WHERE isDeleted = FALSE. Instead of changing all the read queries, you can use Encapsulate Table With View refactoring (page 243) so that the view returns rows from Customer where isDeleted is not TRUE. Another option is to apply the Add Read Method refactoring (page 240) so that the appropriate WHERE clause is implemented in one place.

Second, you must change deletion methods. All external programs must change physical deletes to updates that update Customer.isDeleted instead of physically removing the row. For example, DELETE FROM Customer WHERE PKColumn = nnn will change to UPDATE Customer SET isDeleted = TRUE WHERE PKColumn = nnn. Alternatively, as noted earlier, you can introduce a delete trigger that prevents the deletion and updates Customer.isDeleted to TRUE.

The following code shows you how to set the initial value of the Customer.isDeleted column:

UPDATE Customer SET
  isDeleted = FALSE WHERE isDeleted IS NULL;


The following code depicts the read method for a simple Customer object before and after Introduce Soft Delete refactoring is introduced:

// Before code
stmt.prepare(
"SELECT CustomerId, Name, PhoneNumber " +
"FROM Customer" +
"WHERE " +
"  CustomerId = ?");
stmt.setLong(1,customer.getCustomerID);
stmt.execute();
ResultSet rs = stmt.executeQuery();

// After code
stmt.prepare(
"SELECT CustomerId, Name, PhoneNumber " +
"FROM Customer" +
"WHERE " +
"  CustomerId = ? "+
"  AND isDeleted = ?");
stmt.setLong(1,customer.getCustomerID);
stmt.setBoolean(2,false);
stmt.execute();
ResultSet rs = stmt.executeQuery();


The before and after code snippet shows you how the delete method changes after Introduce Soft Delete refactoring is introduced:

//Before code
stmt.prepare("DELETE FROM Customer WHERE CustomerID=?");
stmt.setLong(1, customer.getCustomerID);
:stmt.executeUpdate();

//After code
stmt.prepare("UPDATE Customer SET isDeleted = ?"+
             "WHERE CustomerID=?");
stmt.setBoolean(1, true);
stmt.setLong(2, customer.getCustomerID);
stmt.executeUpdate();


Introduce Trigger For History

Introduce a new trigger to capture data changes for historical or audit purposes.

Motivation

The primary reason you would apply Introduce Trigger For History is to delegate the tracking of data changes to the database itself. This strategy ensures that if any application modifies critical source data, the change will be tracked or audited.

Potential Tradeoffs

The primary trade-off is performance related—the addition of the trigger will increase the time it takes to update rows in Customer of Figure 8.7.

Figure 8.7. Introducing a history trigger for Customer.

image

Furthermore, you may be forced to update your applications to pass user context information so that the database can record who made the change.

Schema Update Mechanics

To apply Introduce Trigger for History, you need to do the following:

1. Determine actions to collect history for. Although it is possible to insert, update, and delete data in Customer, you may not need to track all the changes. For example, perhaps you are only interested in tracking updates and deletions but not original insertions.

2. Determine what columns to collect history for. You have to identify the columns in Customer against which you are interested in tracking changes to. For example, perhaps you only want to track changes to the PhoneNumber column but nothing else.

3. Determine how to record the historical data. You have two basic choices: First, you could have a generic table that tracks all the historical data changes within your database, or you could introduce a corresponding history table for each table that you want to record history for. The single-table approach will not scale very well, although is appropriate for smaller databases.

4. Introduce the history table. If the history table does not yet exist, create it via the CREATE TABLE command.

5. Introduce the trigger(s). Introduce the appropriate trigger(s) via the CREATE OR REPLACE TRIGGER command. This could be achieved by just having a trigger that captures the original image of the row and inserting it into the CustomerHistory table. A second strategy is to compare the before and after values of the pertinent columns and store descriptions of the changes into CustomerHistory.

The following code shows you how to add the CustomerHistory table and how to update it via the UpdateCustomerHistory trigger. We are capturing the change in every column on the Customer table and recording them in CustomerHistory:

––Create the CustomerHistory table
CREATE TABLE CustomerHistory
(
  CustomerID NUMBER,
  OldName VARCHAR2(32),
  NewName VARCHAR2(32),
  ChangedBy NUMBER,
  ChangedOn DATE
);

––Auditing trigger for table Customer
CREATE OR REPLACE TRIGGER
UpdateCustomerHistory
AFTER INSERT OR UPDATE OR DELETE
ON Customer
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
––Handle Updating of Rows

  IF UPDATING THEN
   IF NOT (NVL(:OLD.Name = :NEW.Name,FALSE) OR
   (:OLD.Name IS NULL AND :NEW.Name IS NULL)) THEN
    CreateCustomerHistoryRow( :NEW.CustomerId,
    :NEW.Name, :OLD.Name, User);
   END IF;
  END IF;

––Handle Deleting of Records
  IF DELETING THEN
    IF (:OLD.Name IS NOT NULL) THEN
      CreateCustomerHistoryRow (:OLD.CustomerId,:OLD.Name,NULL,User);
    END IF;
  END IF;

  IF INSERTING THEN
    IF (:NEW.Name IS NOT NULL) THEN
      CreateCustomerHistoryRow(:NEW.CustomerId, :NEW.Name,NULL,User);
    END IF;
  END IF;
END;
/


Data-Migration Mechanics

Data migration is typically not required for this database refactoring. However, if you want to track insertions, you may decide to create a history record for each existing row within Customer. This works well when Customer includes a column recording the original creation date; otherwise, you will need to generate a value for the insertion date. (The easiest thing to do is to use the current date.)

Access Program Update Mechanics

When you apply Introduce Trigger for History, you need to do the following:

1. Stop application generation of history. When you add triggers to collect historical information, you need to identify any existing logic in external applications where the code is creating historical information and then rework it.

2. Update presentation code. If external programs currently display application-generated history, they will need to be reworked to use the information contained in CustomerHistory.

3. Provide user context to database. If you want the database trigger to record which user made the change to the data, you must provide a user context. Some applications, in particular those built using Web or n-tier technologies, may not be providing this information and will need to be updated to do so. Alternatively, you could just supply a default value for the user context when it is not provided.

Although it would not be part of the refactoring itself, a related schema transformation would be to modify the table design to add columns to record who modified the record and when it was done. A common strategy is to add columns such as UserCreated, CreationDate, UserLastModified, and LastModifiedDate to main tables (see Introduce New Column on page 301). The two user columns would be a user ID that could be used as a foreign key to a user details table. You may also need to add the user details table (see Add Lookup Table on page 153).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset