Chapter 6: Structural Refactorings

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6
Structural Refactorings

Structural refactorings, as the name implies, change the table structure of your database schema. The structural refactorings are as follows:

• Drop Column

• Drop Table

• Drop View

• Introduce Calculated Column

• Introduce Surrogate Key

• Merge Columns

• Merge Tables

• Move Column

• Rename Column

• Rename Table

• Rename View

• Replace Large Object (LOB) With Table

• Replace Column

• Replace One-to-Many With Associative Table

• Replace Surrogate Key With Natural Key

• Split Column

• Split Table

Common Issues When Implementing Structural Refactorings

When implementing structural refactorings, you need to consider several common issues when updating the database schema, including the following:

1. Avoid trigger cycles. You need to implement the trigger so that cycles do not occur—if the value in one of the original columns changes, Table.NewColumn1..N should be updated, but that update should not trigger the same update to the original columns and so on. The following code shows an example of keeping the value of two columns synchronized:

CREATE OR REPLACE TRIGGER SynchronizeFirstName
BEFORE INSERT OR UPDATE
ON Customer
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
IF INSERTING THEN
        IF :NEW.FirstName IS NULL THEN
         :NEW.FirstName := :NEW.FName;
        END IF;
        IF :NEW.Fname IS NULL THEN
         :NEW.FName := :NEW.FirstName;
        END IF;
       END IF;
       IF UPDATING THEN
         IF NOT(:NEW.FirstName=:OLD.FirstName) THEN
          :NEW.FName:=:NEW.FirstName;
         END IF;
         IF NOT(:NEW.FName=:OLD.FName) THEN
          :NEW.FirstName:=:NEW.FName;
         END IF;
        END IF;
       END;
      /

2. Fix broken views. Views are coupled to other portions of your database; so when you apply a structural refactoring, you may inadvertently break a view. View definitions are typically coupled to other views and table definitions. For example, the VCustomerBalance view is defined by joining the Customer and Account table together to obtain by CustomerNumber the total balance across all accounts for each individual customer. If you rename Customer.CustomerNumber, this view will effectively be broken.

3. Fix broken triggers. Triggers are coupled to table definitions; therefore, structural changes such as a renamed or moved column could potentially break a trigger. For example, an insert trigger may validate the data stored in a specific column; and if that column has been changed, the trigger will potentially be broken. The following code finds broken triggers in Oracle, something that you should add to your test suite. You still need other tests to find business logic defects:

SELECT Object_Name, Status
FROM User_Objects
WHERE Object_Type='TRIGGER'
AND Status='INVALID';

4. Fix broken stored procedures. Stored procedures invoke other stored procedures and access tables, views, and columns. Therefore, any structural refactoring has the potential to break an existing stored procedure. The following code finds broken stored procedures in Oracle, something that you should add to your test suite. You still need other tests to find business logic defects:

SELECT Object_Name, Status
FROM User_Objects
WHERE Object_Type='PROCEDURE'
AND Status='INVALID';

5. Fix broken tables. Tables are indirectly coupled to the columns of other tables via naming conventions. For example, if you rename the Customer.CustomerNumber column, you should similarly rename Account.CustomerNumber and Policy.CustomerNumber. The following code finds all the tables with column names containing the text CUSTOMERNUMBER in Oracle:

SELECT Table_Name,Column_Name
FROM User_Tab_Columns
WHERE Column_Name LIKE '%CUSTOMERNUMBER%';

6. Define the transition period. Structural refactorings require a transition period when you implement them in a multi-application environment. You must assign the same drop dates to the original schema that is being refactored as well as any columns and the trigger. This drop date must take into account the time required to update the external programs accessing that portion of the database.

Drop Column

Remove a column from an existing table.

Figure 6.1. Drop the Customer.FavoriteColor column.

Motivation

The primary reason to apply Drop Column is to refactor a database table design or as the result of the refactoring of external applications, such that the column is no longer used. Drop Column is often applied as a step of the Move Column (page 103) database refactoring because the column is removed from the source table. Or, sometimes you discover that some of the columns are not really used. Usually, it is better to remove these columns before someone starts using them by mistake.

Potential Tradeoffs

The column being dropped may contain valuable data; in that case, the data may need to be preserved. You can use Move Data (page 192) to move the data to some other table so that it is preserved. On tables containing many rows, the dropping of a column may run for a long time, making your table unavailable for update during the execution of Drop Column.

Schema Update Mechanics

To update the schema to remove a column, you must do the following:

1. Choose a remove strategy. Some database products may not allow for a column to be removed, forcing you to create a temporary table, move all the data into a temporary table, drop the original table, re-create the original table without the column, move the data from the temporary table, and then drop the temporary table. If your database provides a way to remove columns, you just have to remove the column using the DROP COLUMN option of the ALTER TABLE command.

2. Drop the column. Sometimes, when the amount of data is large, we have to make sure that the Drop Column runs in a reasonable amount of time. To minimize the disruption, schedule the physical removal of the column to a time when the table is least used. Another strategy is to mark the database column as unused; this can be achieved by using the SET UNUSED option of the ALTER TABLE command. The SET UNUSED command runs much faster, thus minimizing disruption. You can then remove the unused columns during scheduled downtimes. When this option is used, the database does not physically remove the column but hides the column from everyone.

3. Rework foreign keys. If FavoriteColor is part of a primary key, you must also remove the corresponding columns from the other tables that use it as (part of) the foreign key to Customer. You will have to re-create the foreign key constraints on these other tables. In this situation, you may want to consider applying refactorings such as Introduce Surrogate Key (page 85) or Replace Surrogate Key with Natural Key (page 135) before applying Drop Column to simplify this refactoring.

An alternative strategy to physically removing the column is masking its existence by introducing a table view that does not reference FavoriteColor via the refactoring Encapsulate Table With View (page 243).

During the transition period, you just associate a comment with Customer.FavoriteColor to indicate that it will soon be dropped. After the transition period, you remove the column from the Customer table via the ALTER TABLE command, as you see here:

COMMENT ON Customer.FavoriteColor 'Drop date = September 14 2007';

–On September 14 2004
ALTER TABLE Customer DROP COLUMN FavoriteColor;

If you are using the SET UNUSED option, you can use the following command to make the Customer.FavoriteColor unused so that it is not really physically removed from the Customer table, but is made unavailable and invisible to all the clients:

ALTER TABLE Customer SET UNUSED FavoriteColor;

Data-Migration Mechanics

To support the removal of a column from a table, you may discover that you need to preserve existing data or you may need to plan for the performance of the Drop Column (because removing a column from a table will disallow data modifications on the table). The primary issue here is to preserve the data before you drop the column. When you are going to remove an existing column from a table that has been in production, you will likely be required by the business to preserve the existing data “just in case” they need it again at some point in the future. The simplest approach is to create a temporary table with the primary key of the source table and the column that is being removed and then move the appropriate data into this new temporary table. You can choose other methods of preserving the data such as archiving data to external files.

The following code depicts the steps to remove the Customer.FavoriteColor column. To preserve the data, you create a temporary table called CustomerFavoriteColor, which includes the primary key from the Customer table and the FavoriteColor column.

CREATE TABLE CustomerFavoriteColor
AS SELECT CustomerID, FavoriteColor FROM Customer;

Access Program Update Mechanics

You must identify and then update any external programs that reference Customer.FavoriteColor. Issues to consider include the following:

1. Refactor code to use alternate data sources. Some external programs may include code that still uses the data currently contained within Customer.FavoriteColor. When this is the case, alternate data sources must be found, and the code reworked to use them; otherwise, the refactoring must be abandoned.

2. Slim down SELECT statements. Some external programs may include queries that read in the data but then ignore the retrieved value.

3. Refactor database inserts and updates. Some external programs may include code that puts a “fake value” into this column for inserts of new data, code that must be removed. Or the programs may include code to not write over FavoriteColor during an insert or update into the database. In other cases, you may have SELECT * FROM Customer where the application expects a certain number of columns and gets the columns from the result set using positional reference. This application code is likely to break now because the result set of the SELECT statement now returns one less column. Generally, it is not a good idea to use SELECT * from any table in your application. Granted, the real problem here is the fact that the application is using positional references, something you should consider refactoring, too.

The following code shows how you have to remove the reference to FavoriteColor:

//Before code
public Customer findByCustomerID(Long customerID) {
  stmt = DB.prepare("SELECT CustomerId, FirstName, "+
    "FavoriteColor FROM Customer WHERE CustomerId = ?");
  stmt.setLong(1, customerID.longValue());
  stmt.execute();
  ResultSet rs = stmt.executeQuery();
  if (rs.next()) {
    customer.setCustomerId(rs.getLong("CustomerId"));
    customer.setFirstName(rs.getString("FirstName"));
    customer.setFavoriteColor(rs.getString
    ("FavoriteColor"));
  }
  return customer;
}

public void insert(long customerId,String firstName, String favoriteColor) {
  stmt = DB.prepare("INSERT into customer" +
    "(Customerid, FirstName, FavoriteColor)" +
    "values (?, ?, ?)");
  stmt.setLong(1, customerId);
  stmt.setString(2, firstName);
  stmt.setString(3, favoriteColor);
  stmt.execute();
}

public void update(long customerId, String firstName, String color) {
  stmt = DB.prepare("UPDATE Customer "+
  "SET FirstName = ?, FavoriteColor=? " +
  "WHERE Customerid = ?");
  stmt.setString(1, firstName);
  stmt.setString(2, color);

  stmt.setLong(3, customerId);
  stmt.executeUpdate();
}

//After code
public Customer findByCustomerID(Long customerID) {
  stmt = DB.prepare("SELECT CustomerId, FirstName " +
    "FROM Customer WHERE CustomerId = ?");
  stmt.setLong(1, customerID.longValue());
  stmt.execute();
  ResultSet rs = stmt.executeQuery();
  if (rs.next()) {
    customer.setCustomerId(rs.getLong("CustomerId"));
    customer.setFirstName(rs.getString("FirstName"));
  }
  return customer;
}

public void insert(long customerId,String firstName) {
  stmt = DB.prepare("INSERT into customer" +
    "(Customerid, FirstName) " +
    "values (?, ?)");
  stmt.setLong(1, customerId);
  stmt.setString(2, firstName);
  stmt.execute();
}

public void update(long customerId, String firstName, String color) {
  stmt = DB.prepare("UPDATE Customer "+
  "SET FirstName = ? " +
  "WHERE Customerid = ?");
  stmt.setString(1, firstName);
  stmt.setLong(2, customerId);
  stmt.executeUpdate();
}

Drop Table

Remove an existing table from the database.

Motivation

Apply Drop Table when a table is no longer required and/or used. This occurs when the table has been replaced by another similar data source, such as another table or a view, or simply when there is no longer need for that specific data source.

Potential Tradeoffs

Dropping a table deletes that specific data from your database, so you may need to preserve some or all of the data. If this is the case, the required data must be stored within another data source, especially when you are normalizing a database design and find that some of the data exists in other tables(s). You can replace the table with a view or a query to the data source. In this scenario, you cannot write back to the same view or data source query.

Schema Update Mechanics

To perform Drop Table, you must resolve data-integrity issues. If TaxJurisdictions is being referenced by any other tables, you have to either remove the foreign key constraint or point the foreign key constraint to another table. Figure 6.2 depicts an example of how to go about removing the TaxJurisdictions table—you just mark the table as deprecated and then remove it after the transition date. The following code depicts the DDL to remove the table:

Figure 6.2. Dropping the TaxJurisdictions table.

– drop date = June 14 2007
DROP TABLE TaxJurisdictions;

You can also choose to just rename the table, as shown below. When doing this, some database products automatically change all references from TaxJurisdictions to TaxJurisdictionsRemoved automatically. You want to delete those referential integrity constraints by using Drop Foreign Key (page 213) because you may not want to have referential integrity constraints to a table that is going to be dropped:

– rename date = June 14 2007
ALTER TABLE TaxJurisdictions RENAME TO
TaxJurisdictionsRemoved;

Data-Migration Mechanics

The only data-migration issue with this refactoring is the potential need to archive the existing data so that it can be restored if needed at a later date. You can do this by using the CREATE TABLE AS SELECT command. The following code depicts the DDL to optionally preserve data in the TaxJurisdictions table:

–copy data before drop
CREATE TABLE TaxJurisdictionsRemoved AS
SELECT * FROM TaxJurisdictions;

– drop date = June 14 2007
DROP TABLE TaxJurisdictions;

Access Program Update Mechanics

Any external programs referencing TaxJurisdictions must be refactored to access the alternative data source(s) that have replaced TaxJurisdictions. If there are no alternatives, and the data is still required, you must not remove the table until the alternative(s) exist.

Drop View

Remove an existing view.

Motivation

Apply Drop View when a view is no longer required and/or used. This occurs when the view has been replaced by another similar data source, such as another view or a table, or simply when there is no longer the need for that specific query.

Potential Tradeoffs

Dropping a view does not delete any data from your database; however, it does mean that the view is no longer available to the external programs that access it. Views are often used to obtain the data for reports. If the data is still required, the view should have already been replaced by another data source, either a view or a table, or a query to the source data itself. This new data access approach should ideally perform as well or better than the view that is being removed. Views are also used to implement security access control (SAC) to data values within a database. When this is the case, a new SAC strategy for the tables accessed by the view should have been previously implemented and deployed. A view-based security strategy is often a lowest-common denominator approach that can be shared across many applications but is not as flexible as a programmatic SAC strategy (Ambler 2003).

Schema Update Mechanics

To remove the view in Figure 6.3, you must apply the DROP VIEW command to AccountDetails after the transition period. The following code to drop the AccountDetails view is very straightforward—you just mark the view as deprecated and then remove it after the transition date.

Figure 6.3. Dropping the AccountDetails view.

– drop date = June 14 2007
DROP VIEW AccountDetails;

Data-Migration Mechanics

There is no data to migrate for this database refactoring.

Access Program Update Mechanics

You must identify and then update any external programs that reference AccountDetails. You may need to refactor SQL code that formerly used AccountDetails to now explicitly access the data directly from the source tables. Similarly, any meta data used to generate SQL using AccountDetails would need to be updated. The following code shows how you have to change your application code to use data from the base tables:

// Before code
stmt.prepare(
  "SELECT * " +
  "FROM AccountDetails "+
  "WHERE CustomerId = ?");
  stmt.setLong(1,customer.getCustomerID);
  stmt.execute();
  ResultSet rs = stmt.executeQuery();

// After code
stmt.prepare(
"SELECT * " +
"FROM Customer, Account " +
"WHERE" +
"  Customer.CustomerId = Account.CustomerId " +
"  AND Customer.CustomerId = ?");
stmt.setLong(1,customer.getCustomerID);
stmt.execute();
ResultSet rs = stmt.executeQuery();

Introduce Calculated Column

Introduce a new column based on calculations involving data in one or more tables. (Figure 6.4 depicts two tables, but it could be any number.)

Figure 6.4. Introducing the Customer.TotalAccountBalance calculated column.

Motivation

The primary reason you would apply Introduce Calculated Column is to improve application performance by providing prepopulated values for a given property derived from other data. For example, you may want to introduce a calculated column that indicates the credit risk level (for example, exemplary, good risk, bad risk) of a client based on that client’s previous payment history with your firm.

Potential Tradeoffs

The calculated column may get out of sync with the actual data values, particularly when external applications are required to update the value. We suggest that you introduce a mechanism, such as a regular batch job or triggers on the source data, which automatically update the calculated column.

Schema Update Mechanics

Applying Introduce Calculated Column can be complicated because of data dependencies and the need to keep the calculated column synchronized with the data values it is based on. You will need to do the following:

1. Determine a synchronization strategy. Your basic choices are batch jobs, application updates, or database triggers. A batch job can be used when you do not require the value to be updated in real time; otherwise, you need to use one of the other two strategies. When the application(s) are responsible to do the appropriate updates, you run the risk of different applications doing it in different ways. The trigger approach is likely the safer of the two real-time strategies because the logic only needs to be implemented once, in the database. Figure 6.4 assumes the use of triggers.

2. Determine how to calculate the value. You have to identify the source data, and how it should be used, to determine the value of TotalAccountBalance.

3. Determine the table to contain the column. You have to determine which table should include TotalAccountBalance. To do this, ask yourself which business entity does the calculated column best describe. For example, a customer’s credit risk indicator is most applicable to the Customer entity.

4. Add the new column. Add Customer.TotalAccountBalance of Figure 6.4 via the Introduce New Column transformation (page 301).

5. Implement the update strategy. You need to implement and test the strategy chosen in Step 1.

The following code shows you how to add the Customer.TotalAccountBalance column and the UpdateCustomerTotalAccountBalance trigger, which is run any time the Account table is modified:

–Create the new column TotalAccountBalance
ALTER TABLE Customer ADD TotalAccountBalance NUMBER;

–Create trigger to keep data in sync.

CREATE OR REPLACE TRIGGER
UpdateCustomerTotalAccountBalance
BEFORE UPDATE OR INSERT OR DELETE
ON Account
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE

NewBalanceToUpdate NUMBER:=0;
CustomerIdToUpdate NUMBER;
BEGIN
CustomerIdToUpdate := :NEW.CustomerID;
IF UPDATING THEN
  NewBalanceToUpdate := :NEW.Balance-:OLD.Balance;
END IF;
IF INSERTING THEN
  NewBalanceToUpdate := :NEW.Balance;
END IF;
IF DELETING THEN
  NewBalanceToUpdate := -1*:OLD.Balance;
  CustomerIdToUpdate := :OLD.CustomerID;
END IF;
UPDATE Customer SET TotalAccountBalance =
  TotalAccountBalance + NewBalanceToUpdate
  WHERE Customerid = CustomerIdToUpdate;
END;
/

Data-Migration Mechanics

There is no data to be migrated per se, although the value of Customer.TotalAccountBalance must be populated based on the calculation. This is typically done once using the UPDATE SQL command or can be done in batch via one or more scripts. The following code shows you how to set the initial value in Customer.TotalAccountBalance:

UPDATE Customer SET
  TotalAccountBalance =
    (SELECT SUM(balance) FROM Account
      WHERE Account.CustomerId = Customer.CustomerId)

Access Program Update Mechanics

When you introduce the calculated column, you need to identify all the places in external applications where this calculation is used and then rework that code to work with TotalAccountBalance. You need to replace existing calculation logic with access to the value of TotalAccountBalance. You may discover that the calculation is performed differently in various applications, either because of a bug or because of a different situation, and therefore you will need to negotiate the correct algorithm with your stakeholder(s). The following code shows how an application is used to calculate the total balance by looping through all the accounts of a customer. In the “after version,” it simply reads the value into memory when the customer object is retrieved from the database:

// Before code
stmt.prepare(
  "SELECT SUM(Account.Balance) Balance FROM Customer, Account " +
    "WHERE Customer.CustomerID = Account.CustomerID "+
    "AND Customer.CustomerID=?");
stmt.setLong(1,customer.getCustomerID);
stmt.execute();
ResultSet rs = stmt.executeQuery();
return rs.getBigDecimal("Balance"));

//After code
return customer.getBalance();

Introduce Surrogate Key

Replace an existing natural key with a surrogate key. This refactoring is the opposite of Replace Surrogate Key With Natural Key (page 135).

Motivation

There are several reasons why you want to introduce a surrogate key to a table:

• Reduce coupling. The primary reason is to reduce coupling between your table schema and the business domain. If part of a natural key is likely to change—for example, if a part number stored within an inventory table is likely to increase in size or change its type (from numeric to alphanumeric)—then having it as a primary key is a dangerous proposition.

• Increase consistency. You may want to apply the Consolidate Key Strategy (page 168) refactoring, potentially improving performance and reducing code complexity.

• Improve database performance. Your database performance may have degraded because of a large composite natural key. (Some databases struggle when a key is made up of several columns.) When you replace the large composite primary key with a single-column surrogate primary key, the database will be able to update the index.

Potential Tradeoffs

Many data professionals prefer natural keys. The debate over surrogate and natural keys is a “religious issue” within the data community, but the reality is that both types of keys have their place. Even though a table has a surrogate primary key, you may still require natural alternate keys to support searching. Because a surrogate key has no business meaning, and because it is typically implemented as a collection of meaningless characters or numbers, your end users cannot use it for searches. As a result, they still need to identify data via natural identifiers. For example, the InventoryItem table has a surrogate primary key called InventoryItemPOID (POID is short for persistent object identifier) and a natural alternate key called InventoryID. Individual items are identified uniquely within the system by the surrogate key but identified by users via the natural key. The value in applying a surrogate key is that it simplifies your key strategy within your database and reduces the coupling between your database schema and your business domain.

Another challenge is that you may implement a surrogate key when it really is not needed. Many people can become overzealous when it comes to implementing keys, and often try to apply the same strategy throughout their schema. For example, in the United States, individual states are identified by a unique two-letter state code (for example, CA for California). This state code is guaranteed to be unique with the United States and Canada; the code for the province of Ontario is ON, and there will never be an American state with that code. The states and provinces are fairly stable entities, there is a large number of codes still available (only 65 of 676 possible combinations have been used to date), and, because of the upheaval it would cause within their own systems, the respective governments are unlikely to change the strategy. Therefore, does it really make sense to introduce a surrogate key to a lookup table listing all the states and provinces? Likely not.

Also, when OriginalKey is being used as a foreign key in other tables, you want to apply Consolidate Key Strategy (page 168) and make similar updates to those tables. Note that this may be more work than it is worth; you might want to reconsider applying this refactoring.

Schema Update Mechanics

Applying Introduce Surrogate Key can be complicated because of the coupling that the original key—in our example, the combination of CustomerNumber, OrderDate, and StoreID—is potentially involved with. Because it is a primary key of a table, it is likely that it also forms (part of) the foreign key back to Order from other tables. You will need to do the following:

1. Introduce the new key column. Add the column to the target table via the SQL command ADD COLUMN. In Figure 6.5, this is OrderPOID. This column will need to be populated with unique values.

Figure 6.5. Introducing the Order.OrderPOID surrogate key.

2. Add a new index. A new index based on OrderPOID needs to be introduced for Order.

3. Deprecate the original column. The original key columns must be marked for demotion to alternate key status, or nonkey status as the case may be, at the end of the transition period. In our example, the column will not be deleted at this time from Order; they will just no longer be considered the primary key. They will be deleted from OrderItem.

4. Update and possibly add referential integrity (RI) triggers. Any triggers that exist to maintain referential integrity between tables need to be updated to work with the corresponding new key values in the other tables. Triggers need to be introduced to populate the value of the foreign key columns during the transition period because the applications may not have been updated to do so.

Figure 6.5 depicts how to introduce OrderPOID, a surrogate key, to the Order table. You also need to recursively apply Introduce Surrogate Key to the OrderItem table of Figure 6.5 to make use of the new key column. This is optional. Of course, OrderItem could still use the existing composite key made up of the CustomerNumber, OrderDate, and StoreID columns, but for consistency, we have decided to refactor that table, too.

The following SQL code introduces and initially populates the OrderPOID column in both the Order and OrderItem tables. It obtains unique values for OrderPOID by invoking the GenerateUniqueID stored procedure as needed, which implements the HIGH-LOW algorithm (Ambler 2003). It also introduces the appropriate index required to support OrderPOID as a key of Order:

–Add new surrogate key to Order table
ALTER TABLE Order ADD OrderPOID NUMBER;

–Add new surrogate foreign key to OrderItem table
ALTER TABLE OrderItem ADD OrderPOID NUMBER;

–Assign values to surrogate key column in Order
UPDATE Order SET OrderPOID =
getOrderPOIDFromOrder(CustomerNumber,OrderDate,StoreID);

–Propagate ForeignKey in OrderItem
UPDATE OrderItem SET OrderPOID =
  (SELECT OrderPOID FROM Order
    WHERE CustomerNumber = Order.CustomerNumber
      AND OrderDate = Order.OrderDate
      AND StoreID=Order.StoreID);

CREATE INDEX OrderOrderPOIDIndex ON Order (OrderPOID);

To support this new key, we need to add the PopulateOrderPOID trigger, which is invoked whenever an insert occurs in OrderItem. This trigger obtains the value of Order.OrderPOID, as you can see in the following SQL code:

CREATE OR REPLACE TRIGGER PopulateOrderPOID
BEFORE INSERT
ON OrderItem
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN

IF :NEW.OrderPOID IS NULL THEN
  :NEW.OrderPOID :=
getOrderPOIDFromOrder(CustomerNumber,OrderDate,StoreID);
END IF;
IF :NEW.OrderPOID IS NOT NULL THEN
  IF :NEW.CustomerNumber IS NULL
    OR :NEW.OrderDate IS NULL
    OR :NEW.StoreID IS NULL
  THEN
    :NEW.CustomerNumber
      := getCustomerNumberFromOrder(OrderPOID);
      :NEW.OrderDate
:= getOrderDateFromOrder(OrderPOID);
  :NEW.StoreID
:= getStoreIDFromOrder(OrderPOID);
  END IF;
END IF;
END;
/

–June 14 2007
ALTER TABLE OrderItem DROP CONSTRAINT
OrderItemToOrderForeignKey;

ALTER TABLE Order DROP CONSTRAINT
OrderPrimaryKey;

ALTER TABLE Order MODIFY OrderPOID NOT NULL;

ALTER TABLE Order ADD CONSTRAINT OrderPrimaryKey
  PRIMARY KEY (OrderPOID);

ALTER TABLE OrderItem DROP CONSTRAINT
OrderItemPrimaryKey;

ALTER TABLE OrderItem MODIFY OrderPOID NOT NULL;
ALTER TABLE OrderItem ADD CONSTRAINT OrderItemPrimaryKey
  PRIMARY KEY (OrderPOID, OrderItemNumber);

ALTER TABLE OrderItem ADD (CONSTRAINT
OrderItemToOrderForeignKey
  FOREIGN KEY (OrderPOID) REFERENCES Order;

CREATE UNIQUE INDEX OrderNaturalKey ON Order
(CustomerNumber,OrderDate,StoreID);

DROP TRIGGER PopulateOrderPOID;

Data-Migration Mechanics

We must generate data in Order.OrderPOID and assign these values to the foreign key columns in other tables.

Access Program Update Mechanics

Any external programs referencing the original key columns must be updated to work with Order.OrderPOID. You may need to rework the code to do the following:

1. Assign new types of key values. If external application code assigns new surrogate key values, instead of the database itself, all external applications need to be reworked to assign values to Order.OrderPOID. Minimally, every single program must implement the same algorithm to do so, but a better strategy is to implement a common service that every application invokes.

2. Join based on the new key. Many external access programs will define joins involving Order, implemented either via hard-coded SQL or via meta data. These joins should be refactored to work with Order.OrderPOID.

3. Retrieve based on the new key. Some external programs will traverse the database one or more rows at a time, retrieving data based on the key values. These retrievals need to be updated to work with Order.OrderPOID.

The following hibernate mapping shows you how the surrogate key is introduced:

//Before mapping
<hibernate-mapping>
  <class name="Order" table="ORDER">
    <many-to-one name="customer"
    class="Customer" column="CUSTOMERNUMBER" />
    <property name="orderDate"/>
    <property name="storeID"/>
    <property name="shipTo"/>
    <property name="billTo"/>
    <property name="total"/>
  </class>
</hibernate-mapping>

//After mapping
<hibernate-mapping>
  <class name="Order" table="ORDER">
    <id name="id" column="ORDERPOID">
      <generator class="OrderPOIDGenerator"/>
    </id>
    <many-to-one name="customer"
     class="Customer" column="CUSTOMERNUMBER" />
    <property name="orderDate"/>
    <property name="storeID"/>
    <property name="shipTo"/>
    <property name="billTo"/>
    <property name="total"/>
  </class>
</hibernate-mapping>

Merge Columns

Merge two or more columns within a single table.

Motivation

There are several reasons why you may want to apply Merge Columns:

• An identical column. Two or more developers may have added the columns unbeknownst to each other, a common occurrence when the developers are on different teams or when meta data describing the table schema is not available. For example, the FeeStructure table has 37 columns, 2 of which are called CA_INIT and CheckingAccountOpeningFee, and both of which store the initial fee levied by the bank when opening a checking account. The second column was added because nobody was sure what the CA_INIT column was really being used for.

• The columns are the result of overdesign. The original columns where introduced to ensure that the information was stored in its constituent forms, but actual usage shows that you do not need the fine details that you originally thought. For example, Customer table of Figure 6.6 includes the columns PhoneCountryCode, PhoneAreaCode, and PhoneLocal to represent a single phone number.

Figure 6.6. Merging columns in the Customer table.

• The actual usage of the columns has become the same. Several columns were originally added to a table, but over time the way that one or more of them are used has changed to the point where they are all being used for the same purpose. For example, the Customer table includes PreferredCheckStyle and SelectedCheckStyle columns (not shown in Figure 6.6). The first column was used to record which style of checks to send to the customer from next season’s selection, and the second column was used to record the style which the customer previously had sent out to them. This was useful 20 years ago when it took several months to order in new checks, but now that they can be printed over night, we have started automatically storing the same value in both columns.

Potential Tradeoffs

This database refactoring can result in a loss of data precision when you merge finely detailed columns. When you merge columns that (you believe) are used for the same purpose, you run the risk that you should in fact be using them for separate things. (If so, you will discover that you need to reintroduce one or more of the original columns.) The usage of the data should determine whether the columns should be merged, something that you will need to explore with your stakeholders.

Schema Update Mechanics

To perform Merge Columns, you must do two things. First, you need to introduce the new column. Add the column to the table via the SQL command ADD COLUMN. In Figure 6.6, this is Customer.PhoneNumber. This step is optional because you may find it possible to use one of the existing columns into which to merge the data. You also need to introduce a synchronization trigger to ensure that the columns remain synchronized with one another. The trigger must be invoked by any change to the columns.

Figure 6.6 shows an example where the Customer table initially stores the phone number of a person in three separate columns: PhoneCountryCode, PhoneAreaCode, and PhoneLocal. Over time, we have discovered that few applications are interested in the country code because they are used only within North America. We have also discovered that every application uses both the area code and the local phone number together. Therefore, we have decided to leave the PhoneCountryCode alone but to merge the PhoneAreaCode and PhoneLocal columns into PhoneNumber, reflecting the actual usage of the data by the application (because the application does not use PhoneAreaCode or PhoneLocal individually). We introduced the SynchronizePhoneNumber trigger to keep the values in the four columns synchronized.

The following SQL code depicts the DDL to introduce the PhoneNumber column and to eventually drop the two original columns:

ALTER TABLE Customer ADD PhoneNumber NUMBER(12);
          COMMENT ON Customer.PhoneNumber 'Added as the
          result of merging Customer.PhoneAreaCode and
          Customer.PhoneLocal finaldate = December 14 2007';

–On December 14 2007
ALTER TABLE Customer DROP COLUMN PhoneAreaCode;
ALTER TABLE Customer DROP COLUMN PhoneLocal;

Data-Migration Mechanics

You must convert all the data from the original column(s) into the merge column, in this case from Customer.PhoneAreaCode and Customer.PhoneLocal into Customer.PhoneNumber. The following SQL code depicts the DML to initially combine the data from PhoneAreaCode and PhoneLocal into PhoneNumber.

/*One-time migration of data from Customer.PhoneAreaCode and Customer.PhoneLocal to
Customer.PhoneNumber. When both the columns are active, there is a need to have a trigger that
keeps both the columns in sync */
UPDATE Customer SET PhoneNumber = PhoneAreaCode*10000000 + PhoneLocal);

Access Program Update Mechanics

You need to analyze the access programs thoroughly, and then update them appropriately, during the transition period. In addition to the obvious, you need to work with Customer.PhoneNumber rather than the former unmerged columns. Potentially, you must remove merging code. There may be code that combines the existing columns into a data attribute similar to the merged column. This code should be refactored and potentially removed entirely.

Second, you may also need to update data-validation code to work with merged data. Some data-validation code may exist solely because the columns have not yet been merged. For example, if a value is stored in two separate columns, you may have validation code in place that verifies that the values are the same. After the columns are merged, there may no longer be a need for this code.

The before and after code snippet shows how the getCustomerPhoneNumber() method changes when we merge the Customer.PhoneAreaCode and Customer.PhoneLocal columns:

//Before code
public String getCustomerPhoneNumber(Customer customer){
  String phoneNumber = customer.getCountryCode();
  phoneNumber.concat(phoneNumberDelimiter());
  phoneNumber.concat(customer.getPhoneAreaCode());
  phoneNumber.concat(customer.getPhoneLocal());
  return phoneNumber;
}

//After code
public String getCustomerPhoneNumber(Customer customer){
  String phoneNumber = customer.getCountryCode();
  phoneNumber.concat(phoneNumberDelimiter());
  phoneNumber.concat(customer.getPhoneNumber());
  return phoneNumber;
}

Merge Tables

Merge two or more tables into a single table.

Motivation

There are several reasons why you may want to apply Merge Tables:

• The tables are the result of over design. The original tables were introduced to ensure that the information was stored in its constituent forms, but actual usage shows that you do not need the fine details that you originally thought. For example, the Employee table includes columns for employee identification, as well as other data, whereas the EmployeeIdentification table specifically captures just identification information.

• The actual usage of the tables has become the same. Over time, the way that one or more tables are used has changed to the point where several tables are being used for the same purpose. You could also have tables that are related to one another in one-to-one fashion; you may want to merge the tables to avoid making the join to the other table. A good example of this is the Employee table mentioned previously. It originally was used to record employee information, but the EmployeeIdentification was introduced to store just identification information. Some people did not realize that this table existed, and evolved the Employee table to capture similar data.

• A table is mistakenly repeated. Two or more developers may have added the tables unbeknownst to each other, a common occurrence when the developers are on different teams or when the meta data describing the table schema is not available. For example, the FeeStructure and FeeSchedule tables both store the initial fee levied by the bank when opening a checking account. The second table was added because nobody was sure what the FeeStructure table was really being used for.

Potential Tradeoffs

Merging two or more tables can result in a loss of data precision when you merge finely detailed tables. When you merge tables that (you believe) are used for the same purpose, you run the risk that you should in fact be using them for separate things. For example, the EmployeeIdentification table may have been introduced to separate security-critical information into a single table that had limited access rights. If so, you will discover that you need to reintroduce one or more of the original tables. The usage of the data should determine whether the tables should be merged.

Schema Update Mechanics

As depicted in Figure 6.7, to update the database schema when you perform Merge Tables, you must do two things. First, introduce the merged table by adding the columns from EmployeeIdentification to Employee table via the SQL command ADD COLUMN. Note that Employee may already include some or all of the required columns, in which case inconsistencies or cluttered domain code may exist; this refactoring should allow for simplification of the application code.

Figure 6.7. Moving all the columns from EmployeeIdentification to Employee.

Second, introduce synchronization trigger(s) to ensure that the tables remain synchronized with one another. The trigger(s) must be invoked by any change to the columns. You need to implement the trigger so that cycles do not occur—if the value in one of the original columns changes, Employee should be updated, but that update should not trigger the same update to the original tables and so on.

Figure 6.7 depicts an example where the Employee table initially stores the employee data. Over time, we have also added EmployeeIdentification table that stores employee identification information. Therefore, we have decided to merge the Employee and EmployeeIdentification tables so that we have all the information regarding the employee at one place. We introduced the SynchronizeIdentification trigger to keep the values in the tables synchronized. The following SQL code depicts the DDL to introduce the Picture, VoicePrint, RetinalPrint columns, and then to eventually drop the EmployeeIdentification table.

ALTER TABLE Employee ADD Picture BINARY;

COMMENT ON Employee.Picture 'Added as the result of merging Employee and EmployeeIdentification
finaldate = December 14 2007';

ALTER TABLE Employee ADD VoicePrint BINARY;

COMMENT ON Employee.VoicePrint 'Added as the result of merging Employee and
EmployeeIdentification finaldate = December 14 2007';

ALTER TABLE Employee ADD RetinalPrint BINARY;

COMMENT ON Employee.RetinalPrint 'Added as the result of merging Employee and
EmployeeIdentification finaldate = December 14 2007';

Data-Migration Mechanics

You must copy all the data from the original tables(s) into the merge table—in this case, from EmployeeIdentification to Employee. This can be done via several means—for example, with an SQL script or with an extract-transform-load (ETL) tool. (With this refactoring, there should not be a transform step.)

The following SQL code depicts the DDL to initially combine the data from the Employee and EmployeeIdentification tables:

/*One-time migration of data from Employee to
EmployeeIdentification. When both the tables are active, there is a need to have a trigger that
keeps both the tables in sync

          */

          UPDATE Employee e SET e.Picture =
            (SELECT ei.Picture FROM
            EmployeeIdentificaion ei
              WHERE
                ei.EmployeeNumber = e.EmployeeNumber);

          UPDATE Employee e SET e.VoicePrint =
            (SELECT ei.VoicePrint FROM
            EmployeeIdentificaion ei
              WHERE
                ei.EmployeeNumber = e.EmployeeNumber);

          UPDATE Employee e SET e.RetinalPrint =
            (SELECT ei.RetinalPrint FROM
            EmployeeIdentificaion ei
              WHERE
      ei.EmployeeNumber = e.EmployeeNumber);

–On December 14 2007
DROP TRIGGER SynchronizeWithEmployee;
DROP TRIGGER SynchronizeWith-
EmployeeIdentification;
DROP TABLE EmployeeIdentification;

The following code shows how SynchronizeWithEmployeeIdentification and SynchronizeWithEmployee triggers are used to keep the values in the tables synchronized:

CREATE TRIGGER SynchronizeWithEmployeeIdentification
          BEFORE INSERT OR UPDATE OR DELETE
          ON Employee
          REFERENCING OLD AS OLD NEW AS NEW
          FOR EACH ROW
          DECLARE
          BEGIN
          IF updating THEN
            updateOrCreateEmployeeIdentification;
          END IF;
          IF inserting THEN
            createNewEmployeeIdentification;
          END IF;
          IF deleting THEN
            deleteEmployeeIdentification;
          END IF;
          END;
          /

          CREATE TRIGGER SynchronizeWithEmployee
          BEFORE INSERT OR UPDATE OR DELETE
          ON EmployeeIdentification
          REFERENCING OLD AS OLD NEW AS NEW
          FOR EACH ROW
          DECLARE
          BEGIN
          IF updating THEN
            updateOrCreateEmployee;
          END IF;
          IF inserting THEN
            createNewEmployee;
          END IF;
          IF deleting THEN
            deleteEmployee;
          END IF;
          END;
/

Access Program Update Mechanics

In addition to the obvious need to work with Employee rather than the former unmerged table(s), potential updates you need to make are as follows:

1. Simplify data access code. Some access code may exist that accesses two or more of the tables involved with the merge. For example, the Employee class may update its information into the two tables in which it is currently stored, tables that have now been merged into one.

2. Incomplete or contradictory updates. Now that the data is stored in one place, you may discover that individual access programs worked only with subsets of the data. For example, the Customer class currently updates its home phone number information in two tables, yet it is really stored in three tables (which have now been merged into one). Other programs may realize that the data quality in the third table was not very good and may include code that counteracts the problems. For example, a reporting class may convert NULL phone numbers to “Unknown,” but now that there are no NULL phone numbers, this code can be removed.

3. Some merged data is not required by some access programs. Some of the access programs that currently work with Employee need only the data that it contains. However, now that columns from EmployeeIdentification have been added, the potential exists that the existing access programs will not update these new columns appropriately. Existing access programs may need to be extended to accept and work with the new columns. For example, the source table for the Employee class may have had a BirthDate column merged into it. Minimally, the Employee class should not overwrite this column with invalid data, and it should insert an appropriate value when a new customer object is created. You may need to apply Introduce Default Value (page 186) to the columns that are merged into Employee.

The following example shows example code changes when you apply Merge Tables to Employee and EmployeeIdentification:

//Before code
public Employee getEmployeeInformation (Long
employeeNumber) throws SQLException {
  Employee employee = new Employee();

  stmt.prepare(
  "SELECT EmployeeNumber, Name, PhoneNumber " +
  "FROM Employee" +
  "WHERE EmployeeNumber = ?");
  stmt.setLong(1,employeeNumber);
  stmt.execute();
  ResultSet rs = stmt.executeQuery();
  employee.setEmployeeNumber(rs.getLong
  ("EmployeeNumber"));
  employee.setName(rs.getLong("Name"));
  employee.setPhoneNumber(rs.getLong("PhoneNumber"));

  stmt.prepare(
  "SELECT Picture, VoicePrint, RetinalPrint " +
  "FROM EmployeeIdentification" +
  "WHERE EmployeeNumber = ?");
  stmt.setLong(1,employeeNumber);
  stmt.execute();
  rs = stmt.executeQuery();
  employee.setPicture(rs.getBlob("Picture"));
  employee.setVoicePrint(rs.getBlob("VoicePrint"));
  employee.setRetinalPrint(rs.getBlob("RetinalPrint"));

  return employee;
}

//After code
public Employee getEmployeeInformation (Long
employeeNumber) throws SQLException {
  Employee employee = new Employee();

  stmt.prepare(
  "SELECT EmployeeNumber, Name, PhoneNumber " +
  "Picture, VoicePrint, RetinalPrint "+
  "FROM Employee" +
  "WHERE EmployeeNumber = ?");
  stmt.setLong(1,employeeNumber);

  stmt.execute();
  ResultSet rs = stmt.executeQuery();
  employee.setEmployeeNumber(rs.getLong
  ("EmployeeNumber"));
  employee.setName(rs.getLong("Name"));
  employee.setPhoneNumber(rs.getLong("PhoneNumber"));
  employee.setPicture(rs.getBlob("Picture"));
  employee.setVoicePrint(rs.getBlob("VoicePrint"));
  employee.setRetinalPrint(rs.getBlob("RetinalPrint"));
  return employee;
}

Move Column

Migrate a table column, with all of its data, to another existing table.

Motivation

There are several reasons to apply Move Column. The first two reasons may appear contradictory, but remember that database refactoring is situational. Common motivations to apply Move Column include the following:

• Normalization. It is common that an existing column breaks one of the rules of normalization. By moving the column to another table, you can increase the normalization of the source table and thereby reduce data redundancy within your database.

• Denormalization to reduce common joins. It is quite common to discover that a table is included in a join simply to gain access to a single column. You can improve performance by removing the need to perform this join by moving the column into the other table.

• Reorganization of a split table. You previously performed Split Table (page 145), or the table was effectively split in the original design, and you then realize that one more column needs to be moved. Perhaps the column exists in the commonly accessed table but is rarely needed, or perhaps it exists in a rarely accessed table but is needed quite often. In the first case, network performance would be improved by not selecting and then transmitting the column to the applications when it is not required; in the second case, database performance would be improved because few joins would be required.

Potential Tradeoffs

Moving a column to increase normalization reduces data redundancy but may decrease performance if additional joins are required by your applications to obtain the data. Conversely, if you improve performance by denormalizing your schema through moving the column, you will increase data redundancy.

Schema Update Mechanics

To update the database schema when you perform Move Column, you must do the following:

1. Identify deletion rule(s). What should happen when a row from one table is deleted? Should the corresponding row in the other table be deleted, should the column value in the corresponding be nulled/zeroed out, should the corresponding value be set to some sort of default value, or should the corresponding value be left alone? This rule will be implemented in trigger code (as discussed later in this section). Note that zero or more deletion triggers may already exist to support referential integrity rules between the tables.

2. Identify insertion rule(s). What should happen when a row is inserted into one of the tables? Should a corresponding row in the other table be inserted or should nothing occur? This rule will be implemented in trigger code, and zero or more insertion triggers may already exist.

3. Introduce the new column. Add the column to the target table via the SQL command ADD COLUMN. In Figure 6.8, this is Account.Balance.

Figure 6.8. Moving the Balance column from Customer to Account.

4. Introduce triggers. You require triggers on both the original and new column to copy data from one column to the other during the transition period. These trigger(s) must be invoked by any change to a row.

Figure 6.8 depicts an example where Customer.Balance is moved into the Account table. This is a normalization issue—instead of storing a balance each time a customer’s account is updated, we can instead store it once for each individual account. During the transition period, the Balance column appears in both Customer and Account, as you would expect.

The existing triggers are interesting. The Account table already had a trigger for inserts and updates that checked to see that the corresponding row exists in the Customer table, a basic referential integrity (RI) check. This trigger is left alone. The Customer table had a delete trigger to ensure that it is not deleted if an Account row refers to it, another RI check. The advantage of this is that we do not need to implement a deletion rule for the moved column because we cannot “do the wrong thing" and delete a Customer row that has one or more Account rows referencing it.

In the following code, we introduce the Account.Balance column and the SynchronizeCustomerBalance and SynchronizeAccountBalance triggers to keep the Balance columns synchronized. The code also includes the scripts to drop the scaffolding code after the transition period ends:

ALTER TABLE Account ADD Balance NUMBER(32,7);
COMMENT ON Account.Balance 'Moved from Customer table, finaldate = June 14 2007';

COMMENT ON Customer.Balance 'Moved to Account table, dropdate = June 14 2007';

CREATE OR REPLACE TRIGGER SynchronizeCustomerBalance
  BEFORE INSERT OR UPDATE
  ON Account
  REFERENCING OLD AS OLD NEW AS NEW
  FOR EACH ROW
  DECLARE
  BEGIN
    IF :NEW.Balance IS NOT NULL THEN
      UpdateCustomerBalance;
    END IF;
  END;

CREATE OR REPLACE TRIGGER SynchronizeAccountBalance
  BEFORE INSERT OR UPDATE OR DELETE
  ON Customer
  REFERENCING OLD AS OLD NEW AS NEW
  FOR EACH ROW
  DECLARE
  BEGIN
    IF DELETING THEN
      DeleteCustomerIfAccountNotFound;
    END IF;
    IF (UPDATING OR INSERTING) THEN
      IF :NEW.Balance IS NOT NULL THEN
        UpdateAccountBalanceForCustomer;
      END IF;
    END IF;
  END;

—On June 14 2007
ALTER TABLE Customer DROP COLUMN Balance;
DROP TRIGGER SynchronizeCustomerBalance;
DROP TRIGGER SynchronizeAccountBalance;

Data-Migration Mechanics

Copy all the data from the original column into the new column—in this case, from Customer.Balance to Account.Balance. This can be done via several means—for example, with a SQL script or with an ETL tool. (With this refactoring, there should not be a transform step.) The following code depicts the DML to move the Balance column values from Customer to Account:

/*One-time migration of data from Customer.Balance to Account.Balance. When both the columns are
active, there is a need to have a trigger that keeps both the balance columns in sync
*/

UPDATE Account SET Balance =
(SELECT Balance FROM Customer
WHERE CustomerID = Account.CustomerID);

Access Program Update Mechanics

You need to analyze the access programs thoroughly and then update them appropriately during the transition period. Potential updates you need to make are as follows:

1. Rework joins to use the moved column. Joins, either hard-coded in SQL or defined via meta data, must be refactored to work with the moved column. For example, when you move Customer.Balance to Account.Balance, you have to change your queries to get the balance information from Account and not from Customer.

2. Add the new table to joins. The Account table must now be included in joins if it is not already included. This may degrade performance.

3. Remove the original table from joins. There may be some joins that included the Customer table for the sole purpose of joining in the data from Customer.Balance. Now that this column has been moved, the Customer table can be removed from the join, which could potentially improve performance.

The following code shows you how the Customer.Balance column is referenced in the original code and the updated code that works with Account.Balance:

//Before code
      public BigDecimal getCustomerBalance(Long
      customerId) throws SQLException {
        PreparedStatement stmt = null;
        BigDecimal customerBalance = null;

        stmt = DB.prepare("SELECT Balance FROM
        Customer " +
                "WHERE CustomerId = ?");
        stmt.setLong(1, customerId.longValue());
        ResultSet rs = stmt.executeQuery();
        if (rs.next()) {
          customerBalance = rs.getBigDecimal("Balance");
        }
        return customerBalance;
}

//After code
      public BigDecimal getCustomerBalance(Long
      customerId) throws SQLException {
        PreparedStatement stmt = null;
        BigDecimal customerBalance = null;
        stmt = DB.prepare(
           "SELECT SUM(Account.Balance) Balance " +
           "FROM Customer, Account " +
           "WHERE Customer.CustomerId=
           Account.CustomerId " +
           "AND CustomerId = ?");
        stmt.setLong(1, customerId.longValue());
        ResultSet rs = stmt.executeQuery();
        if (rs.next()) {
          customerBalance = rs.getBigDecimal("Balance");
        }
        return customerBalance;
}

Rename Column

Rename an existing table column.

Motivation

The primary reasons to apply Rename Column are to increase the readability of your database schema, to conform to accepted database naming conventions in your enterprise, or to enable database porting. For example, when you are porting from one database product to another, you may discover that the original column name cannot be used because it is a reserved keyword in the new database.

Potential Tradeoffs

The primary trade-off is the cost of refactoring the external applications that access the column versus the improved readability and/or consistency provided by the new name.

Schema Update Mechanics

To rename a column, you must do the following:

1. Introduce the new column. In Figure 6.9, we first add FirstName to the target table via the SQL command ADD COLUMN.

Figure 6.9. Renaming the Customer.FName column.

2. Introduce a synchronization trigger. As you can see in Figure 6.9, you require a trigger to copy data from one column to the other during the transition period. This trigger must be invoked by any change to the data row.

3. Rename other columns. If FName is used in other tables as (part of) a foreign key, you may want to apply Rename Column recursively to ensure naming consistency. For example, if Customer.CustomerNumber is renamed as Customer.CustomerID, you may want to go ahead and rename all instances of CustomerNumber in other tables. Therefore, Account.CustomerNumber will now be renamed to Account.CustomerID to keep the column names consistent.

The following code depicts the DDL to rename Customer.FName to Customer.FirstName, creates the SynchronizeFirstName trigger that synchronizes the data during the transition period, and removes the original column and trigger after the transition period ends:

ALTER TABLE Customer ADD FirstName VARCHAR(40);

COMMENT ON Customer.FirstName 'Renaming of FName column, finaldate = November 14 2007';

COMMENT ON Customer.FName 'Renamed to FirstName,
dropdate = November 14 2007';

UPDATE Customer SET FirstName = FName;

CREATE OR REPLACE TRIGGER SynchronizeFirstName
BEFORE INSERT OR UPDATE
ON Customer
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
  IF INSERTING THEN
    IF :NEW.FirstName IS NULL THEN
      :NEW.FirstName := :NEW.FName;
    END IF;
    IF :NEW.Fname IS NULL THEN
      :NEW.FName := :NEW.FirstName;

    END IF;
  END IF;

  IF UPDATING THEN
    IF NOT(:NEW.FirstName=:OLD.FirstName) THEN
      :NEW.FName:=:NEW.FirstName;
    END IF;
    IF NOT(:NEW.FName=:OLD.FName) THEN
      :NEW.FirstName:=:NEW.FName;
    END IF;
  END IF;
  END;
/

—On Nov 30 2007
DROP TRIGGER SynchronizeFirstName;
ALTER TABLE Customer DROP COLUMN FName;

Data-Migration Mechanics

You need to copy all the data from the original column into the new column, in this case from FName to FirstName. See the refactoring Move Data (page 192) for details.

Access Program Update Mechanics

External programs that reference Customer.FName must be updated to reference columns by its new name. You should simply have to update any embedded SQL and/or mapping meta data. The following hibernate mapping files show how your mapping files would change when the fName column is renamed:

//Before mapping
<hibernate-mapping>
<class name="Customer" table="Customer">
  <id name="id" column="CUSTOMERID">
         <generator class="CustomerIdGenerator"/>
  </id>
  <property name="fName"/>
</class>
</hibernate-mapping>

//Transition mapping
<hibernate-mapping>
<class name="Customer" table="Customer">
  <id name="id" column="CUSTOMERID">
         <generator class="CustomerIdGenerator"/>
  </id>
  <property name="fName"/>
  <property name="firstName"/>
</class>
</hibernate-mapping>

//After mapping
<hibernate-mapping>

<class name="Customer" table="Customer">
  <id name="id" column="CUSTOMERID">
         <generator class="CustomerIdGenerator"/>
  </id>
  <property name="firstName"/>
</class>
</hibernate-mapping>

Rename Table

Rename an existing table.

Motivation

The primary reason to apply Rename Table is to clarify the table’s meaning and intent in the overall database schema or to conform to accepted database naming conventions. Ideally, these reasons are one in the same.

Potential Tradeoffs

The primary tradeoff is the cost to refactoring the external applications that access the table versus the improved readability and/or consistency provided by the new name.

Schema Update Mechanics via a New Table

To perform Rename Table, you create a new table using the SQL command CREATE TABLE, in this case Customer. If any columns of Cust_TB_Prod are used in other tables as (part of) a foreign key, you must refactor those constraints and/or indices implementing the foreign key to refer to Customer.

We want to rename Cust_TB_Prod to Customer as depicted in Figure 6.10. The SynchronizeCust_TB_Prod and SynchronizeCustomer triggers keep the two tables synchronized with each other. Each trigger is invoked by any change to a row in the Cust_TB_Prod or Customer table, respectively. The following code depicts the DDL to rename the table and introduce the triggers:

Figure 6.10. Renaming the Cust_TB_Prod table to Customer.

CREATE TABLE Customer
  (FirstName VARCHAR(40),
  LastName VARCHAR(40),
);

COMMENT ON Customer 'Renaming of Cust_TB_Prod,
finaldate = September 14 2006';

COMMENT ON Cust_TB_Prod 'Renamed to Customer,
dropdate = September 14 2006';

CREATE OR REPLACE TRIGGER SynchronizeCustomer
BEFORE INSERT OR UPDATE
ON Cust_TB_Prod
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
IF updating THEN
  findAndUpdateIfNotFoundCreateCustomer;
END IF;
IF inserting THEN
  createNewIntoCustomer;
END IF;
IF deleting THEN
  deleteFromCustomer;

END IF;
END;
/

CREATE OR REPLACE TRIGGER SynchronizeCust_TB_Prod
BEFORE INSERT OR UPDATE
ON Customer
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
IF updating THEN
        findAndUpdateIfNotFoundCreateCust_TB_Prod;
END IF;
IF inserting THEN
  createNewIntoCust_TB_Prod;
END IF;
IF deleting THEN
  deleteFromCust_TB_Prod;
END IF;
END;
/

Schema Update Mechanics via an Updateable View

The second approach is to rename the table and then introduce an updateable view with the original name of the table. Note that some database products supports the RENAME option of the ALTER TABLE command. If yours does not, you must re-create the table with the new name, and then load the data into the table. You should schedule the drop of the old table so that it is not accidentally used by someone. As you see in Figure 6.11, the updateable view is needed during the transition period to support external access programs that have yet to be refactored to use the renamed table. This strategy is viable only if your database supports updateable views.

Figure 6.11. Renaming the Cust_TB_Prod table to Customer via a view.

Figure 6.11 shows how to rename the table using a view. You simply use the RENAME TO clause of ALTER TABLE SQL command to rename the table and create the view, as shown here:

ALTER TABLE Cust_tb_Prod RENAME TO Customer;

CREATE VIEW Cust_tb_Prod AS
SELECT * FROM Customer;

As with the new table approach, if any columns of Cust_TB_Prod are used in other tables as (part of) a foreign key, you must re-create those constraints and/or indices implementing the foreign key to refer to Customer.

Data-Migration Mechanics

With the updateable view approach, you do not need to migrate data. However, with the new table approach, you must first copy the data. Copy all the data from the original table into the new table—in this case, from Cust_TB_Prod to Customer. Second, you must introduce triggers on both the original and new table to copy data from one table to the other during the transition period. These triggers must be invoked by any change to the tables. You need to implement the triggers so that cycles do not occur—if Cust_TB_Prod changes, Customer must also be updated, but that update should not trigger the same update to Cust_TB_Prod and so on. The following code shows how to copy the data from Cust_TB_Prod into Customer:

INSERT INTO Customer
SELECT * FROM CUST_TB_PROD;

Access Program Update Mechanics

External access programs must be refactored to work with Customer rather than Cust_TB_Prod. The following hibernate mapping shows the change you have to make when you rename the Cust_TB_Prod table:

//Before mapping
<hibernate-mapping>
<class name="Customer" table="Cust_TB_Prod">
.....
</class>
</hibernate-mapping>

//After mapping
<hibernate-mapping>
<class name="Customer" table="Customer">
.....
</class>
</hibernate-mapping>

Rename View

Rename an existing view.

Motivation

The primary reason to apply Rename View is to increase the readability of your database schema or to conform to accepted database naming conventions. Ideally, these reasons are one in the same.

Potential Tradeoffs

The primary tradeoff is the cost of refactoring the external applications that access the view versus the improved readability and/or consistency provided by the new name.

Schema Update Mechanics

To perform Rename View, you must do the following:

1. Introduce the new view. Create a new view using the SQL command CREATE VIEW. In Figure 6.12, this is CustomerOrders, the definition of which has to match with CustOrds.

Figure 6.12. Renaming the CustOrds view to CustomerOrders.

2. Deprecate the original view. After you create CustomerOrders, you want to indicate that CustOrds should no longer be updated with new features or bug fixes.

3. Redefine the old view. You should redefine CustOrds to be based on CustomerOrders to avoid duplicate code streams. The benefit is that any changes to CustomerOrders, such as a new data source for a column, will propagate to CustOrds without any additional work.

The following code depicts the DDL to create CustomerOrders, which is identical to the code that was used to create CustOrds:

CREATE VIEW CustomerOrders AS
SELECT
  Customer.CustomerCode,
  Order.OrderID,
  Order.OrderDate,
  Order.ProductCode
FROM Customer,Order
WHERE
  Customer.CustomerCode = Order.CustomerCode
  AND Order.ShipDate = TOMORROW
;

COMMENT ON CustomerOrders 'Renamed from CustOrds,
CustOrds dropdate = September 15 2007';

The following code drops and then re-creates the CustOrds so that it derives its results from the CustomerOrders:

DROP VIEW CustOrds;

CREATE VIEW CustOrds AS
SELECT CustomerCode, OrderID, OrderDate, ProductCode
FROM CustomerOrders
;

Data-Migration Mechanics

There is no data to migrate for this database refactoring.

Access Program Update Mechanics

You must refactor all the external programs currently accessing CustOrds to access CustomerOrders. In the case of hard-coded SQL, this will simply be the update of the FROM/INTO clauses, and in the case of meta data-driven approaches, the update of the name within the representation for this view.

The following code shows how a reference to CustOrds should be changed to CustomerOrders:

// Before code
stmt.prepare(
  "SELECT * " +
  "FROM CustOrds "+
  "WHERE CustomerId = ?");
  stmt.setLong(1,customer.getCustomerID);
  stmt.execute();
  ResultSet rs = stmt.executeQuery();

// After code
stmt.prepare(
  "SELECT * " +
  "FROM CustomerOrders " +
  "WHERE " +
  "  CustomerId = ?");
  stmt.setLong(1,customer.getCustomerID);
  stmt.execute();
  ResultSet rs = stmt.executeQuery();

Replace LOB With Table

Replace a large object (LOB) column that contains structured data with a new table or with new columns in the same table. LOBs are typically stored as either a binary large object (BLOB), a variable character (VARCHAR), or in some cases as XML data.

Motivation

The primary reason to replace a LOB with a table is because you need to treat parts of the LOB as distinct data elements. This is quite common with XML data structures that have been stored in a single column, often to avoid “shredding” the structure into individual columns.

Potential Tradeoffs

The advantage of storing a complex data structure in a single column is that you can quickly get that specific data structure easily. This proves particularly valuable when existing code already works with the data structure in question and merely needs to use the database as a handy file-storage mechanism. By replacing the LOB with a table, or perhaps several tables if the structure of the data contained within the LOB is very complex, you can easily work with the individual data elements within your database. You also make the data more accessible to other applications that may not need the exact structure contained within the LOB. Furthermore, if the LOB contains some data that is already present within your database, you can potentially use those existing data sources to represent the appropriate portions of the LOB, reducing data redundancy (and thus integrity errors). The disadvantage of this approach is the increased time and complexity required to shred the data to store it within the database and similarly to retrieve and convert it back into the required structure.

Schema Update Mechanics

As you see in Figure 6.13, applying Replace LOB With Table is straightforward. You need to do the following:

Figure 6.13. Replacing a LOB with a table.

1. Determine a table schema. You need to analyze Customer.MailingAddress to determine the data that it contains, and then develop a table schema to store that data. If the structure contained within MailingAddress is complex, you either need to have smaller LOB columns within the new table or recursively apply Replace LOB With Table to deal with these smaller structures.

2. Add the table. In Figure 6.13, this is CustomerAddress. The columns of this table are the primary key of Customer, the CustomerPOID column, and the new columns containing the data from MailingAddress.

3. Deprecate the original column. MailingAddress must be marked for deletion at the end of the deprecation period.

4. Add a new index. For performance reasons, you may need to introduce an new index for CustomerAddress via the CREATE INDEX command.

5. Introduce synchronization triggers. Customer will require a trigger to populate the values in CustomerAddress appropriately. This trigger will need to shred the MailingAddress structure and store it appropriately. Similarly, a trigger on CustomerAddress is needed to update Customer during the transition period.

The code to create the CustomerAddress table, add an index, define the synchronization triggers, and eventually drop the column and triggers is shown here:

CREATE TABLE CustomerAddress (
  CustomerPOID NUMBER NOT NULL,
  Street VARCHAR(40),
  City VARCHAR(40),
  State VARCHAR(40),
  ZipCode VARCHAR(10)
);

CREATE INDEX IndexCustomerAddress ON
  CustomerAddress(CustomerPOID);

–Triggers to keep the tables synchronized
CREATE OR REPLACE TRIGGER SynchronizeWithCustomerAddress
BEFORE INSERT OR UPDATE OR DELETE
ON Customer
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
  IF updating THEN
    FindOrCreateCustomerAddress;
  END IF;
  IF inserting THEN
    CreateCustomerAddress;
  END IF;

  IF deleting THEN
    DeleteCustomerAddress;
  END IF;
END;
/

CREATE OR REPLACE TRIGGER SynchronizeWithCustomer
BEFORE INSERT OR UPDATE OR DELETE
ON Customer
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
  IF updating OR inserting THEN
    FindAndUpdateCustomer;
  END IF;
  IF deleting THEN
    UpdateCustomerNullAddress;
  END IF;
END;
/

–On Dec 14 2007
DROP TRIGGER SynchronizeWithCustomerAddress;
DROP TRIGGER SynchronizeWithCustomer;

ALTER TABLE Customer DROP COLUMN MailingAddress;

Data-Migration Mechanics

CustomerAddress must be populated by shredding and then copying the data contained in Customer.MailingAddress. The value of Customer.CustomerPOID must also be copied to maintain the relationship. If MailingAddress has a NULL or empty value, a row in CustomerAddress does not need to be created. This can be accomplished via one or more SQL scripts, as you see in the following code:

INSERT INTO CustomerAddress
  SELECT
    CustomerPOID,
    ExtractStreet(MailingAddress),
    ExtractCity(MailingAddress),
    ExtractState(MailingAddress),
    ExtractZipCode(MailingAddress)
  FROM Customer
    WHERE MailingAddress IS NOT NULL;

Access Program Update Mechanics

You must identify any external programs referencing Customer.MailingAddress so that they can be updated to work with CustomerAddress as appropriate. You will need to do the following:

1. Remove translation code. External programs could have code that shreds the data within MailingAddress to work with its subdata elements, or they could contain code that takes the source data elements and builds the format to be stored into MailingAddress. This code will no longer be needed with the new data structure.

2. Add translation code. Conversely, some external programs may require the exact data structure contained within MailingAddress. If several applications require this, you should consider introducing stored procedures or introduce a library within the database to do this translation, enabling reuse.

3. Write code to access the new table. After you add CustomerAddress, you have to write application code that uses this new table rather than MailingAddress.

The following code shows how the code to retrieve data attributes from Customer.MailingAddress is replaced with a SELECT against the Customer-Address table:

// Before code
public Customer findByCustomerID(Long customerPOID) {
  Customer customer = new Customer();
  stmt = DB.prepare("SELECT CustomerPOID, " +
    "MailingAddress, Name, PhoneNumber " +
    "FROM Customer " +
    "WHERE CustomerPOID = ?");
  stmt.setLong(1, customerPOID);
  stmt.execute();
  ResultSet rs = stmt.executeQuery();
  if (rs.next()) {
    customer.setCustomerId(rs.getLong("CustomerPOID"));
    customer.setName(rs.getString("Name"));
    customer.setPhoneNumber(rs.getString
    ("PhoneNumber"));
    String mailingAddress = rs.getString
    ("MailingAddress");
    customer.setStreet(extractStreet(mailingAddress));
    customer.setCity(extractCity(mailingAddress));
    customer.setState(extractState(mailingAddress));
    customer.setZipCode(extractZipCode(mailingAddress));
  }
return customer;
}

// After code
public Customer findByCustomerID(Long customerPOID) {
  Customer customer = new Customer();
  stmt = DB.prepare("SELECT CustomerPOID, "+
    "Name, PhoneNumber, "+
    "Street, City, State, ZipCode " +
  "FROM Customer, CustomerAddress " +
  "WHERE Customer.CustomerPOID = ? " +
  "AND Customer.CustomerPOID =
  CustomerAddress.CustomerPOID");
  stmt.setLong(1, customerPOID);
  stmt.execute();
  ResultSet rs = stmt.executeQuery();
  if (rs.next()) {
    customer.setCustomerId(rs.getLong("CustomerPOID"));
    customer.setName(rs.getString("Name"));
    customer.setPhoneNumber(rs.getString
    ("PhoneNumber"));
    customer.setStreet(rs.getString("Street"));
    customer.setCity(rs.getString("City"));
    customer.setState(rs.getSring("State"));
    customer.setZipCode(rs.getString("ZipCode"));
  }
return customer;
}

Replace Column

Replace an existing nonkey column with a new one.

For replacing a column that is part of a key, either the primary key or an alternate key, see the Introduce Surrogate Key (page 85) and Replace Surrogate Key With Natural Key (page 135) refactorings.

Motivation

There are two reasons why you want to apply Replace Column. First, the most common reason is that usage of the column has changed over time, requiring you to change its type. For example, you previously had a numeric customer identifier, but now your business stakeholders have made it alphanumeric. Second, this may be an intermediate step to implement other refactorings. Another common reason to replace an existing column is that it is often an important step in merging two similar data sources, or applying Consolidate Key Strategy (page 168) because you need to ensure type and format consistency with another column.

Potential Tradeoffs

A significant risk when replacing a column is information loss when transferring the data to the replacement column. This is particularly true when the types of the two columns are significantly different—converting from a CHAR to a VARCHAR is straightforward as is NUMERIC to CHAR, but converting CHAR to NUMERIC can be problematic when the original column contains non-numeric characters.

Schema Update Mechanics

To apply Replace Column, you must do the following:

1. Introduce the new column. Add the column to the target table via the SQL command ADD COLUMN. In Figure 6.14, this is CustomerID.

Figure 6.14. Replacing the Customer.CustomerNumber column.

2. Deprecate the original column. CustomerNumber must be marked for deletion at the end of your chosen transition period.

3. Introduce a synchronization trigger. As you can see in Figure 6.14, you require a trigger to copy data from one column to the other during the transition period. This trigger must be invoked by any change to a data row.

4. Update other tables. If CustomerNumber is used in other tables as part of a foreign key, you will want to replace those columns similarly, as well as update any corresponding index definitions.

The following SQL code depicts the DDL to replace the column, create the synchronization trigger, and eventually drop the column and trigger after the transition period:

ALTER TABLE Customer ADD CustomerID CHAR(12);

COMMENT ON Customer.CustomerID 'Replaces CustomerNumber column, finaldate = 2007-06-14';

COMMENT ON Customer.CustomerNumber 'Replaced with CustomerID, dropdate = 2007-06-14';

CREATE OR REPLACE TRIGGER SynchronizeCustomerIDNumber
BEFORE INSERT OR UPDATE
ON Customer

REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
  BEGIN
  IF :NEW.CustomerID IS NULL THEN
    :NEW.CustomerID:=
      formatCustomerNumber(:New.CustomerNumber);
  END IF;
  IF :NEW.CustomerNumber IS NULL THEN
    :New.CustomerNumber := :New.CustomerID;
  END IF;
END;
/

–On June 14 2007
DROP TRIGGER SynchronizeCustomerIDNumber;
ALTER TABLE Customer DROP COLUMN CustomerNumber;

Data-Migration Mechanics

The data must be initially copied from CustomerNumber to CustomerID and then kept synchronized during the transition period (for example, via stored procedures). As described earlier, this can be problematic with the data formats are significantly different from one another. Before applying Replace Column, you may discover that you need to apply one or more data quality refactorings to clean up the source data first. The code to copy the values into the new column is shown here:

UPDATE Customer SET CustomerID = CustomerNumber;

Access Program Update Mechanics

The primary issue is that external programs need to be refactored to work with the new data type and format of CustomerID. This could imply that conversion code be written that converts back and forth between the old data format and the new. A longer-term strategy, although potentially a more expensive one, would be to completely rework all the external program code to use the new data format. The following code snippet shows you how the column name and the data type needs to change in the application code:

// Before code
public Customer findByCustomerID(Long customerID) {
  Customer customer = new Customer();
  stmt = DB.prepare("SELECT CustomerPOID, " +
    "CustomerNumber, FirstName, LastName " +
    "FROM Customer " +

    "WHERE CustomerPOID = ?");
  stmt.setLong(1, customerID);
  stmt.execute();
  ResultSet rs = stmt.executeQuery();
  if (rs.next()) {
    customer.setCustomerPOID(rs.getLong
("CustomerPOID"));
    customer.setCustomerNumber(rs.getInt
("CustomerNumber"));
    customer.setFirstName(rs.getString("FirstName"));
    customer.setLastName(rs.getString("LastName"));
  }
return customer;
}

// After code
public Customer findByCustomerID(Long customerID) {
  Customer customer = new Customer();
  stmt = DB.prepare("SELECT CustomerPOID, " +
    "CustomerID, FirstName, LastName " +
    "FROM Customer " +
    "WHERE CustomerPOID = ?");
  stmt.setLong(1, customerID);
  stmt.execute();
  ResultSet rs = stmt.executeQuery();
  if (rs.next()) {
    customer.setCustomerPOID(rs.getLong
("CustomerPOID"));
    customer.setCustomerID(rs.getString("CustomerID"));
    customer.setFirstName(rs.getString("FirstName"));
    customer.setLastName(rs.getString("LastName"));
  }
return customer;
}

Replace One-To-Many With Associative Table

Replace a one-to-many association between two tables with an associative table.

Motivation

The primary reason to introduce an associative table between two tables is to implement a many-to-many association between them later on. It is quite common for a one-to-many association to evolve into a many-to-many association. For example, any given employee currently has at most one manager. (The president of the company is the only person without a manager.) However, the company wants to move to a matrix organization structure where people can potentially report to several managers. Because a one-to-many association is a subset of a many-to-many association, the new associative table would implement the existing hierarchical organization structure yet be ready for the coming matrix structure. You may also want to add information to the relationship itself that does not belong to either of the existing tables.

Potential Tradeoffs

You are overbuilding your database schema when you use an associative table to implement a one-to-many association. If the association is not likely to evolve into a many-to-many relationship, it is not advisable to take this approach. When you add associative tables, you are increasing the number of joins you have to make to get to the relevant data, thus degrading performance and making it harder to understand the database schema.

Schema Update Mechanics

To apply Replace One-To-Many With Associative Table, you must do the following:

1. Add the associative table. In Figure 6.15, this is Holds. The columns of this table are the combination of the primary keys of Customer and Policy. Note that some tables may not necessarily have a primary key, although this is rare—when this is the case, you may decide to apply the Introduce Surrogate Key refactoring.

Figure 6.15. Replacing a one-to-many with an associative table.

2. Deprecate the original column. Because we no longer maintain the relationship directly from Policy to Customer, Policy.CustomerPOID must be marked for deletion at the end of the transition period, because it is currently used to maintain the one-to-many relationship with Customer but will no longer be needed.

3. Add a new index. A new index for Holds should be introduced via the Introduce Index (page 248) refactoring.

4. Introduce synchronization triggers. Policy will require a trigger that will populate the key values in the Holds table, if the appropriate values do not already exist, during the transition period. Similarly, there will need to be a trigger on Holds that verifies that Policy.CustomerPOID is populated appropriately.

The code to add the Holds table, to add an index on Holds, to add the synchronization triggers, and finally to drop the old schema and triggers is shown here:

CREATE TABLE Holds (
  CustomerPOID BIGINT,
  PolicyID INT,
);

CREATE INDEX HoldsIndex ON Holds
    (CustomerPOID, PolicyID);

CREATE OR REPLACE TRIGGER InsertHoldsRow
BEFORE INSERT OR UPDATE OR DELETE
ON Policy
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
  IF updating THEN
    UpdateInsertHolds;
  END IF;
  IF inserting THEN
    CreateHolds;
  END IF;
  IF deleting THEN
    RemoveHolds;
  END IF;
END;
/

CREATE OR REPLACE TRIGGER UpdatePolicyCustomerPOID
BEFORE INSERT OR UPDATE OR DELETE
ON Holds
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
BEGIN
IF updating THEN
  UpdateInsertPolicy;
END IF;
IF inserting THEN
  CreatePolicy;
END IF;
IF deleting THEN
  RemovePolicy;
END IF;
END;
/

–On Mar 15 2007
DROP TRIGGER InsertHoldsRow;
DROP TRIGGER UpdatePolicyCustomerPOID;
ALTER TABLE customer
DROP COLUMN balance;

Data-Migration Mechanics

The associative table must be populated by copying the values of Policy.CustomerPOID and Policy.PolicyID into Holds.CustomerPOID and Holds.PolicyID, respectively. This can be accomplished via a simple SQL script, as follows:

INSERT INTO Holds (CustomerPOID,PolicyID)
SELECT CustomerPOID,PolicyID FROM Policy

Access Program Update Mechanics

To update external programs, you must do the following:

1. Remove updates to the foreign key. Any code to assign values to Policy.CustomerPOID should be refactored to write to Holds to maintain the association properly.

2. Rework joins. Many external access programs will define joins involving Customer and Policy, implemented either via hard-coded SQL or via meta data. These joins should be refactored to work with Holds.

3. Rework retrievals. Some external programs will traverse the database one or more rows at a time, retrieving data based on the key values, traversing from Policy to Customer. These retrievals will need to be updated similarly.

The following code shows how to change your application code so that retrieval of data is now done via a join using the associative table:

//Before code
stmt.prepare(
  "SELECT Customer.CustomerPOID, Customer.Name, " +
  "Policy.PolicyID,Policy.Amount " +
  "FROM Customer, Policy" +

  "WHERE Customer.CustomerPOID = Policy.CustomerPOID " +
  "AND Customer.CustomerPOID = ? ");
stmt.setLong(1,customerPOID);
ResultSet rs = stmt.executeQuery();

//After code
stmt.prepare(
  "SELECT Customer.CustomerPOID, Customer.Name, " +
  "Policy.PolicyID,Policy.Amount " +
  "FROM Customer, Holds, Policy" +
  "WHERE Customer.CustomerPOID = Holds.CustomerPOID " +
  "AND Holds.PolicyId = Policy.PolicyId " +
  "AND Customer.CustomerPOID = ? ");
stmt.setLong(1,customerPOID);
ResultSet rs = stmt.executeQuery();

Replace Surrogate Key With Natural Key

Replace a surrogate key with an existing natural key. This refactoring is the opposite of Introduce Surrogate Key (page 85).

Motivation

There are several reasons to apply Replace Surrogate Key with Natural Key:

• Reduce overhead. When you replace a surrogate key with an existing natural key, you reduce the overhead within your table structure of maintaining the additional surrogate key column(s).

• To consolidate your key strategy. To support Consolidate Key Strategy (page 168), you may decide to first replace an existing surrogate primary key with the “official” natural key.

• Remove nonrequired keys. You may have discovered that a surrogate key was introduced to a table when it really was not needed. It is always better to remove unused indexes to improve performance.

Potential Tradeoffs

Although many data professionals debate the use of surrogate versus natural keys, the reality is that both types of keys have their place. When you have tables with natural keys, each external application, as well as the database itself, must access data from each table in its own unique way. Sometimes, the key will be a single numeric column, sometimes a single character column, or sometimes a combination of several columns. With a consistent surrogate key strategy, tables are accessed in the exact same manner, enabling you to simplify your code. Thus, by replacing a surrogate key with a natural key, you potentially increase the complexity of the code that accesses your database. The primary advantage is that you simplify your table schema.

Schema Update Mechanics

Applying Replace Surrogate Key With Natural Key can be complicated because of the coupling that the surrogate key is potentially involved with. Because it is a primary key of a table, it is likely that it also forms (part of) foreign keys within other tables. You will need to do the following:

1. Identify the column(s) to form the new primary key. In Figure 6.16, this is StateCode. (This could be several columns.) StateCode must have a unique value within each row for it to qualify to be a primary key.

Figure 6.16. Replacing a surrogate key with a natural key.

2. Add a new index. If one does not already exist, a new index based on StateCode needs to be introduced for State.

3. Deprecate the original column. StatePOID must be marked for deletion at the end of the transition period.

4. Update coupled tables. If StatePOID is used in other tables as part of a foreign key, you will want to update those tables to use the new key. You must remove the column(s) using Drop Column (page 172), which currently corresponds to StatePOID. You also need to add new column(s) that correspond to StateCode if those columns do not already exist. The corresponding index definition(s) need to be updated to reflect this change. When StatePOID is used in many tables, you may want to consider updating the tables one at a time to simplify the effort.

5. Update and possibly add RI triggers. Any triggers that exist to maintain referential integrity between tables must be updated to work with the corresponding StateCode values in the other tables.

Figure 6.16 depicts how to remove State.StatePOID, a surrogate key, replacing it with the existing State.StateCode as key. To support this new key, we must add the PopulateStateCode trigger, which is invoked whenever an insert occurs in Address, obtaining the value of State.StateCode.

CREATE OR REPLACE TRIGGER PopulateStateCode
  BEFORE INSERT
  ON Address
  REFERENCING OLD AS OLD NEW AS NEW
  FOR EACH ROW
  DECLARE
  BEGIN
    IF :NEW.StateCode IS NULL THEN
      :NEW.StateCode := getStatePOIDFromState(StatePOID);
    END IF;
  END;
/

ALTER TABLE Address ADD (CONSTRAINT
AddressToStateForeignKey FOREIGN KEY (StateCode)
REFERENCES State;

–June 14 2007
ALTER TABLE Address DROP CONSTRAINT
AddressToStateForeignKey;
ALTER TABLE State DROP CONSTRAINT StatePrimaryKey;
ALTER TABLE State MODIFY StateCode NOT NULL;
ALTER TABLE State ADD CONSTRAINT StatePrimaryKey
      PRIMARY KEY (StateCode);
DROP TRIGGER PopulateStateCode;

Data-Migration Mechanics

There is no data to migrate for this database refactoring.

Access Program Update Mechanics

To update external access programs, you need to do the following:

1. Remove surrogate key code. The code to assign values to the surrogate key column (which may be implemented either within external applications or the database) should no longer be invoked. It may not even be needed any longer at all.

2. Joining based on the new key. Many external access programs will define joins involving State, implemented either via hard-coded SQL or via meta data. These joins should be refactored to work with StateCode, not StatePOID.

3. Retrievals based on the new key. Some external programs will traverse the database one or more rows at a time, retrieving data based on the key values. These retrievals must be updated similarly.

The following hibernate mappings show how the referenced tables must refer to the new keys, and how the POID columns are no longer generated:

//Before mapping
<hibernate-mapping>
<class name="State" table="STATE">
  <id name="id" column="STATEPOID">
    <generator class="IdGenerator"/>
  </id>
  <property name="stateCode" />
  <property name="name" />
</class>
</hibernate-mapping>

<hibernate-mapping>
<class name="Address" table="ADDRESS">
  <id name="id" column="ADDRESSID" >
    <generator class="IdGenerator"/>
  </id>
  <property name="streetLine" />
  <property name="city" />
  <property name="postalCode" />
  <many-to-one name="state" class="State"
    column="STATEPOID" not-null="true"/>
  <many-to-one name="country" class="Country"
    column="COUNTRYID" not-null="true"/>
</class>
</hibernate-mapping>

//After mapping
<hibernate-mapping>
<class name="State" table="STATE">
  <property name="stateCode" />
  <property name="name" />
</class>
</hibernate-mapping>

<hibernate-mapping>
<class name="Address" table="ADDRESS">

  <id name="id" column="ADDRESSID" >
    <generator class="IdGenerator"/>
  </id>
  <property name="streetLine" />
  <property name="city" />
  <property name="postalCode" />
  <many-to-one name="state" class="State"
    column="STATECODE" not-null="true"/>
  <many-to-one name="country" class="Country"
    column="COUNTRYID" not-null="true"/>
</class>
</hibernate-mapping>

Split Column

Split a column into one or more columns within a single table.

Note

If one or more of the new columns needs to appear in another table, first apply Split Column and then apply Move Column (page 103).

Motivation

There are two reasons why you may want to apply Split Column. First, you have a need for fine-grained data. For example, the Customer table has a Name column, which contains the full name of the person, but you want to split this column so that you can store FirstName, MiddleName, and LastName as independent columns.

Second, the column has several uses. The original column was introduced to track the Account status, and now you are also using it to track the type of Account. For example, the Account.Status column contains the status of the account (such as Open, Closed, OverDrawn, and so on). Unknowingly, someone else has also started using it for account type information such as Checking, Savings, and so on. We need to split these usages into their own fields to avoid introduction of bugs because of dual usage.

Potential Tradeoffs

This database refactoring can result in duplication of data when you split columns. When you split a column that (you believe) is used for different purposes, you run the risk that you should in fact be using the new columns for same things. (If so, you will discover that you need to apply Merge Columns.) The usage of a column should determine whether it should be split.

Schema Update Mechanics

To perform Split Column, you must first introduce the new columns. Add the column to the table via the SQL command ADD COLUMN. In Figure 6.17, this is FirstName, MiddleName, and LastName. This step is optional because you may find it possible to use one of the existing columns into which to split the data. Then you must introduce a synchronization trigger to ensure that the columns remain in sync with one another. The trigger must be invoked by any change to the columns.

Figure 6.17. Splitting the Customer.Name column.

Figure 6.17 depicts an example where the Customer table initially stores the name of a person in the column Name. Over time, we have discovered that few applications are interested in the full name, but instead need components of the full name, in particular the last name of the customer. We have also discovered that many applications include duplicate code to split the Name column, a source of potential bugs. Therefore, we have decided to split the Name column into FirstName, MiddleName, and LastName columns, reflecting the actual usage of the data. We introduced the SynchronizeCustomerName trigger to keep the values in the columns synchronized. The following code implements the changes to the schema:

ALTER TABLE Customer ADD FirstName VARCHAR(40);

COMMENT ON Customer.FirstName 'Added as the result of splitting Customer.Name finaldate =
December 14 2007';

ALTER TABLE Customer ADD MiddleName VARCHAR(40);

COMMENT ON Customer.MiddleName 'Added as the result of splitting Customer.Name finaldate =
December 14 2007';

ALTER TABLE Customer ADD LastName VARCHAR(40);

COMMENT ON Customer.LastName 'Added as the result of splitting Customer.Name finaldate = December
14 2007';

COMMENT ON Customer.Name 'Replaced with FirstName,
LastName and MiddleName, will be dropped December 14 2007';

–Trigger to keep all the split columns in sync
CREATE OR REPLACE TRIGGER SynchronizeCustomerName
  BEFORE INSERT OR UPDATE
  ON Customer
  REFERENCING OLD AS OLD NEW AS NEW
  FOR EACH ROW
  DECLARE
  BEGIN
    IF :NEW.FirstName IS NULL THEN
      :NEW.FirstName := getFirstName(Name);
    END IF;
    IF :NEW.MiddleName IS NULL THEN
      :NEW.MiddleName := getMiddleName(Name);
    END IF;
    IF :NEW.LastName IS NULL THEN
      :NEW.LastName := getLastName(Name);
    END IF;
END;
/

–On December 14 2007
ALTER TABLE Customer DROP COLUMN Name;
DROP TRIGGER SynchronizeCustomerName;

Data-Migration Mechanics

You must copy all the data from the original column(s) into the split columns—in this case, from Customer.Name into FirstName, MiddleName, and, LastName. The following code depicts the DML to initially split the data from Name into the three new columns. (The source code for the three stored functions that are invoked are not shown for the sake of brevity.)

/*One-time migration of data from Customer.Name to Customer.FirstName, Customer.MiddleName, and
Customer.LastName. When both set of columns are active, there is a need to have a trigger that
keeps both set of columns in sync
*/

UPDATE Customer SET FirstName = getFirstName(Name);
UPDATE Customer SET MiddleName = getMiddleName(Name);
UPDATE Customer SET LastName = getLastName(Name);

Access Program Update Mechanics

You need to analyze the access programs thoroughly, and then update them appropriately, during the transition period. In addition to the obvious need to work with FirstName, MiddleName, and LastName rather than the former columns, potential updates you need to make are as follows:

1. Remove splitting code. There may be code that splits the existing columns into a data attribute similar to the split columns. This code should be refactored and potentially removed entirely.

2. Update data-validation code to work with split data. Some data-validation code may exist that exists solely because the columns have not been split. For example, if a value is stored in the Customer.Name column, you may have validation code in place that verifies that the values contain the FirstName and LastName. After the column is split, there may no longer be a need for this code.

3. Refactor the user interface. After the original column is split, the presentation layer should make use of the finer-grained data, if it was not doing so already, as appropriate.

The following code shows how the application makes use of the finer-grained data available to it:

//Before code
public Customer findByCustomerID(Long customerID) {
Customer customer = new Customer();
stmt = DB.prepare("SELECT CustomerID, "+
"Name, PhoneNumber " +
"FROM Customer " +
"WHERE CustomerID = ?");
stmt.setLong(1, customerID);
stmt.execute();
ResultSet rs = stmt.executeQuery();
if (rs.next()) {
customer.setCustomerId(rs.getLong("CustomerID"));
String name = rs.getString("Name");
customer.setFirstName(getFirstName(name));
customer.setMiddleName(getMiddleName(name));
customer.setLastName(getMiddleName(name));
customer.setPhoneNumber(rs.getString("PhoneNumber"));
}
return customer;
}

//After code
public Customer findByCustomerID(Long customerID) {
Customer customer = new Customer();

stmt = DB.prepare("SELECT CustomerID, "+
"FirstName, MiddleName, LastName, PhoneNumber " +
"FROM Customer " +
"WHERE CustomerID = ?");
stmt.setLong(1, customerID);
stmt.execute();
ResultSet rs = stmt.executeQuery();
if (rs.next()) {
  customer.setCustomerId(rs.getLong("CustomerID"));
  customer.setFirstName(rs.getString("FirstName"));
  customer.setMiddleName(rs.getString("MiddleName"));
  customer.setLastName(rs.getString("LastName"));
  customer.setPhoneNumber(rs.getString("PhoneNumber"));
}
return customer;
}

Split Table

Vertically split (for example, by columns) an existing table into one or more tables.

Note

If the destination of the split columns happens to be an existing table, then in reality you would be applying Move Column(s) (page 103). To split a table horizontally (for example, by rows), apply Move Data (page 192).

Motivation

There are several reasons why you may want to apply Split Table:

• Performance improvement. It is very common for most applications to require a core collection of data attributes of any given entity, and then a specific subset of the noncore data attributes. For example, the core columns of the Employee table would include the columns required to store their name, address, and phone numbers; whereas noncore columns would include the Picture column as well as salary information. Because Employee.Picture is large, and required only by a few applications, you would want to consider splitting it off into its own table. This would help to improve retrieval access times for applications that select all columns from the Employee table yet do not require the picture.

• Restrict data access. You may want to restrict access to some columns, perhaps the salary information within the Employee table, by splitting it off into its own table and assigning specific security access control (SAC) rules to it.

• Reduce repeating data groups (apply 1NF). The original table may have been designed when requirements were not yet finalized, or by people who did not appreciate why you need to normalize data structures (Date 2001; Ambler 2003). For example, the Employee table may store descriptions of the five previous evaluation reviews for the person. This information is a repeating group that you would want to split off into an Employee-Evaluation table.

Potential Tradeoffs

When you split a table that (you believe) is used for different purposes, you run the risk that you would in fact be using the new tables for same things; if so, you will discover that you need to apply Merge Tables (page 96). The usage of a table should determine whether it should be split.

Schema Update Mechanics

To perform Split Table, you must first add the table(s) via the SQL command CREATE TABLE. This step is optional because you may find it possible to use an existing table(s) into which to move the columns. In this situation, you should repeatedly apply the Move Column refactoring (page 103). Second, you must introduce a trigger to ensure that the columns remain synchronized with one another. The trigger must be invoked by any change to the tables. You need to implement the trigger so that cycles do not occur.

Figure 6.18 depicts an example where the Address table initially stores the address information along with the state code and state name. To reduce data duplication, we have decided to split Address table into Address and State tables, reflecting the current refactor of the table design. We introduced the SynchronizeWithAddress and SynchronizeWithState triggers to keep the values in the tables synchronized:

Figure 6.18. Splitting the Address table.

CREATE TABLE State (
  StateCode VARCHAR(2) NOT NULL,
  Name VARCHAR(40) NOT NULL,
  CONSTRAINT PKState
    PRIMARY KEY (StateCode)
);

COMMENT ON State.Name 'Added as the result of splitting Address into Address and State drop date
= December 14 2007';

–Trigger to keep all the split tables in sync
CREATE OR REPLACE TRIGGER SynchronizeWithAddress
BEFORE INSERT OR UPDATE
ON Address
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
  IF updating THEN
    FindOrCreateState;
  END IF;
  IF inserting THEN
    CreateState;
  END IF;
END;
/

CREATE OR REPLACE TRIGGER SynchronizeWithState
BEFORE UPDATE OF Statename
ON State
REFERENCING OLD AS OLD NEW AS NEW
FOR EACH ROW
DECLARE
BEGIN
  IF updating THEN
    FindAndUpdateAllAddressesForStateName;
  END IF;
END;
/

–On December 14 2007
ALTER TABLE Address DROP COLUMN StateName;
DROP TRIGGER SynchronizeWithAddress;
DROP TRIGGER SynchronizeWithState;

Data-Migration Mechanics

You must copy all the data from the original column(s) into the new table’s columns. In the case of Figure 6.18, this would be from Address.StateCode and Address.StateName into State.StateCode and State.Name, respectively. The following code shows how to initially migrate this data:

/*One-time migration of data from Address.StateCode and Address.StateName to State. When both set
of columns are active, there is a need to have a trigger that keeps both set of columns in sync
*/

INSERT INTO State (StateCode, Name)
  SELECT StateCode,StateName FROM Address
  WHERE StateCode IS NOT NULL AND StateName IS NOT NULL
  GROUP BY StateCode, StateName;

Access Program Update Mechanics

You must analyze the access programs thoroughly, and then update them appropriately, during the transition period. In addition to the obvious need to work with the new columns rather than the former columns, potential updates you will need to make are as follows:

1. Introduce new table meta data. If you are using a meta data-based persistence framework, you must introduce new meta data for State and change the meta data for Address.

2. Update SQL code. Similarly, any embedded SQL code that accesses Address must be updated to join in State where appropriate. This may slightly reduce performance of this code.

3. Refactor the user interface. After the original table is split, the presentation layer should make use of the finer-grained data, if it was not doing so already, as appropriate.

The following hibernate mappings show how we split the Address table and create a new State table:

//Before mapping
<hibernate-mapping>
<class name="Address" table="ADDRESS">
  <id name="id" column="ADDRESSID">
    <generator class="IdGenerator"/>
  </id>
  <property name="street" />
  <property name="city" />
  <property name="stateCode" />
  <property name="stateName" />
</class>
</hibernate-mapping>

//After mapping

//Address table
<hibernate-mapping>
<class name="Address" table="ADDRESS">
  <id name="id" column="ADDRESSID">
    <generator class="IdGenerator"/>
  </id>
  <property name="street" />
  <property name="city" />
  <many-to-one name="state" class="State"
    column="STATECODE" not-null="true"/>
</class>
</hibernate-mapping>

//State table
<hibernate-mapping>
<class name="State" table="STATE">
  <id name="stateCode" column="stateCode">
    <generator class="assigned"/>
  </id>
  <property name="stateName" />
</class>
</hibernate-mapping>

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 6: Structural Refactorings

Create new playlist

Sign In

Sign Up

Chapter 6Structural Refactorings

Common Issues When Implementing Structural Refactorings

Drop Column

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Drop Table

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Drop View

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Introduce Calculated Column

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Introduce Surrogate Key

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Merge Columns

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Merge Tables

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Move Column

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Rename Column

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Rename Table

Motivation

Potential Tradeoffs

Schema Update Mechanics via a New Table

Schema Update Mechanics via an Updateable View

Data-Migration Mechanics

Access Program Update Mechanics

Rename View

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Replace LOB With Table

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Replace Column

Table of Contents for
Chapter 6: Structural Refactorings

Chapter 6
Structural Refactorings