Chapter 11: Transformations

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11
Transformations

Transformations are changes that change the semantics of your database schema by adding new features to it. The transformations described in this chapter are as follows:

• Insert Data

• Introduce New Column

• Introduce New Table

• Introduce View

• Update Data

Insert Data

Insert data into an existing table.

Motivation

You typically need to apply Insert Data as the result of structural changes within your table design. You may need to apply Insert Data as the result of the following:

• Table reorganization. When you are using Rename Table (page 113), Merge Tables (page 96), Split Table (page 145), or Drop Table (page 77) refactorings, you may have to use Insert Data to reorganize the data in the existing tables.

• Provide static lookup data. All applications need static lookup data—for example, tables listing the states/provinces (for instance, Illinois and Ontario) that you do business in, a list of address types (for instance, home, business, vacation), and a list of account types (for instance, checking, savings, and investment). If your application does not include administrative editing screens for maintaining these lists, you need to insert this data manually.

• Create test data. During development, you need known data values inserted into your development database(s) to support your testing efforts.

Potential Tradeoffs

Inserting new data into tables can be tricky, especially when you are going to insert lookup data that is referenced from one or more other tables. For example, assume the Address table references data in the State table, which you are inserting new data into. The data that you insert into State must contain valid values that will appear in Address.

Schema Update Mechanics

To update the database schema, you must do the following:

1. Identify the data to insert. This includes identifying any dependencies. If this is part of moving data from another table, should the source row be deleted? Is the inserted data new, and if so, have the values been accepted by your project stakeholder(s)?

2. Identify the data destination. Which table is the data being inserted into?

3. Identify the data source. Is the data coming from another table or is it a manual insert (for example, you are writing a script to create the data)?

4. Identify transformation needs. Does the source data need to be translated before it can be inserted into the target table? For example, you may have a list of metric measurements that need to be converted into imperial values before being inserted into a lookup table.

Data-Migration Mechanics

When small amounts of data need to be inserted, you will likely find that a simple SQL script that inserts the source data into the target location is sufficient. For large amounts of data, you need to take a more sophisticated approach, such as using database utilities such as Oracle’s SQLLDR or a bulk loader (because of the time it will take to insert the data).

Figure 11.1 shows how we insert a new record into the AccountType table representing brokerage accounts. This AccountType supports new functionality that needs to first be tested and then moved to production later on. The following code depicts the DML to insert data into the AccountType table:

Figure 11.1. Inserting a new AccountType.

INSERT INTO AccountType
(AccountTypeId,Name,EffectiveDate)
VALUES
(6,'Brokerage','Feb 1 2007'),

Access Program Update Mechanics

When Insert Data is applied as the result of another database refactoring, the applications should have been updated to reflect that refactoring. However, when Insert Data is applied to support the addition of data within an existing schema, it is possible that external application code will need to be changed. You need to update WHERE clauses. For example, in Figure 11.1, we inserted Brokerage into the AccountType table. You may need to update SELECT statements to not read this value when you only want to work with standard banking accounts (everything but brokerage), as you see in the following code. You can also use Introduce View (page 306) to create a specific view that the application uses which returns this subset of data:

// Before code
stmt.prepare(
  "SELECT * FROM AccountType " +
    "WHERE AccountTypeId NOT IN (?,?)");
stmt.setLong(1,PRIVATEACCOUNT.getId);
stmt.setLong(2,MONEYMARKETACCOUNT.getId);
stmt.execute();
ResultSet standardAccountTypes = stmt.executeQuery();

//After code
stmt.prepare(
  "SELECT * FROM AccountType " +
    "WHERE AccountTypeId NOT IN (?,?,?)");
stmt.setLong(1,PRIVATEACCOUNT.getId);
stmt.setLong(2,MONEYMARKETACCOUNT.getId);
stmt.setLong(3,BROKERAGE.getId);
stmt.execute();
ResultSet standardAccountTypes = stmt.executeQuery();

Similarly, you need to update source code that validates the values of data attributes. For example, you may have code that defines premium accounts as being either of type Private or Money Market in your application logic. You now have to add Brokerage to this list, as you see in the following code:

//Before code
public enum PremiumAccountType {
  PRIVATEACCOUNT(new Long(3)),
  MONEYMARKET(new Long(4));

private Long id;
public Long getId() {
  return id;
}

PremiumAccountType(Long value) {
  this.id = value;
}

public static Boolean
  isPremiumAccountType(Long idToFind) {
    for (PremiumAccountType premiumAccountType :
          PremiumAccountType.values()) {
      if (premiumAccountType.id.equals(idToFind))
        return Boolean.TRUE;
       }
        Return Boolean.FALSE
}

//After code
public enum PremiumAccountType {
  PRIVATEACCOUNT(new Long(3)),
  MONEYMARKET(new Long(4)),
  BROKERAGE(new Long(6));

private Long id;
public Long getId() {
  return id;
}

PremiumAccountType(Long value) {
  this.id = value;
}

public static Boolean
  isPremiumAccountType(Long idToFind) {
    for (PremiumAccountType premiumAccountType :
          PremiumAccountType.values()) {
       if (premiumAccountType.id.equals(idToFind))
          return Boolean.TRUE;
        }
         Return Boolean.FALSE
}

Introduce New Column

Introduce a new column in an existing table.

Motivation

There are several reasons why you may want to apply the Introduce New Column transformation:

• To persist a new attribute. A new requirement may necessitate the addition of a new column in your database.

• Intermediate step of a refactoring. Many database refactorings, such as Move Column (page 103) and Rename Column (page 109), include a step in which you introduce a new column into an existing table.

Potential Tradeoffs

You need to ensure that the column does not exist elsewhere; otherwise, you are at risk of increasing referential integrity problems due to greater data redundancy.

Schema Update Mechanics

As depicted in Figure 11.2, to update the database schema you must simply introduce CountryCode via the ADD clause of the ALTER TABLE SQL command. The State.CountryCode column is a foreign key reference to the Country table (not shown), enabling our application(s) to track states by country. The following code depicts the DDL to introduce the State.CountryCode column. To create the referential constraint, apply the Add Foreign Key Constraint refactoring (page 204):

Figure 11.2. Introducing the State.CountryCode column.

ALTER TABLE State ADD Country Code VARCHAR2(3) NULL;

Data-Migration Mechanics

Although not a data migration per se, a significant primary challenge with this transformation is to populate the column with data values after you have added it to the table. You need to do the following:

1. Work with your project stakeholders to identify the appropriate values.

2. Either manually input the new values into the column or write a script to automatically populate the column (or a combination of the two strategies).

3. Consider applying the refactorings Introduce Default Value (page 186) Drop Non-Nullable (page 177), or Make Column Non-Nullable (page 189), as appropriate, to this new column.

The following code depicts the DML to populate State.CountryCode with the initial value of ‘USA’:

UPDATE State SET CountryCode = 'USA'
WHERE CountryCode IS NULL;

Access Program Update Mechanics

The update mechanics are simple because all you must do is use the new column in your application(s) to work with the new column. In the following code, we show an example of how hibernate OR mapping meta data would be affected:

//Before mapping
<hibernate-mapping>
<class name="State" table="STATE">
  <id name="id" column="StateCode"></id>
  <property name="name"/>
</class>
</hibernate-mapping>

//After mapping
<hibernate-mapping>
<class name="State" table="STATE">
  <id name="id" column="StateCode"></id>
  <property name="name"/>
  <many-to-one name="country"
               class="Country"
               column="COUNTRYCODE"/>
</class>
</hibernate-mapping>

Introduce New Table

Introduce a new table in an existing database.

Motivation

There are several reasons why you may want to apply the Introduce New Table transformation:

• To persist a new attribute. A new requirement may necessitate the addition of a new table in your database.

• As an intermediate step of a refactoring. Many database refactorings, such as Split Table (page 145) and Rename Table (page 113), include a step in which you introduce a new table in place of an existing table.

• To introduce a new official data source. It is quite common to discover that similar information is stored in several tables. For example, there may be several sources of customer information, sources that are often out of synchronization with each other and sometimes even contradictory. In this scenario, you must use Use Official Data Source (page 271), and then over time apply Drop Table (page 77) to the original source tables.

• Need to back up data. While implementing some refactorings, such as Drop Table (page 77) or Merge Tables (page 96), you may need to create a table to hold table data for intermediate usages or for backup purposes.

Potential Tradeoffs

The primary tradeoff is that you need to ensure that the table you want to introduce does not exist elsewhere already. Often, the exact table that you require will not exist, but something close will—you may discover that it is easier to refactor that existing table than it is to add a new table containing redundant information.

Schema Update Mechanics

As depicted in Figure 11.3, to update the database schema you must simply introduce CustomerIdentification via the CREATE TABLE SQL command. The following code depicts the DDL to introduce the CustomerIdentification table. The CustomerIdentification.CustomerID column is a foreign key reference to the Customer table (not shown), enabling our application(s) to track the various ways to identify individual customers:

Figure 11.3. Introducing the CustomerIdentification table.

CREATE TABLE CustomerIdentification(
  CustomerID NUMBER NOT NULL,
  Photo      BLOB,
  PassportID NUMBER,
  CONSTRAINT PK_CustomerIdentification
    PRIMARY KEY (CustomerID)
  );

Data-Migration Mechanics

Although not a data migration per se, a significant primary challenge with this transformation is to populate the table with data values after you have added it to the database. You need to do the following:

1. Work with your project stakeholders to identify the appropriate values.

2. Either manually input the new values into the table or write a script to automatically populate the table (or a combination of the two strategies).

3. Consider applying the Insert Data transformation (page 296).

Access Program Update Mechanics

Ideally, the update mechanics are straightforward because all you should have to do is use the new table in your application(s) to work with the new table. However, when you are introducing a new table as a replacement for several other tables, you will often discover that your new table has a slightly different schema, and worse yet implements slightly different data semantics, than the existing tables. When this is the case, you need to refactor the access programs to work with the new version.

Introduce View

Create a view based on existing tables in the database.

Motivation

There are several reasons why you may want to apply Introduce View:

• Summarize data for reporting. Many reports require summary data, which can be generated via the view definition.

• Replace redundant reads. Several external programs, or stored procedures for that matter, often implement the same retrieval query. These queries can be replaced by a common read-only table or view.

• Data security. A view can be used to provide end users with read access to data but not update privileges.

• Encapsulate access to a table. Some organizations choose to encapsulate access to tables by defining updateable views that external programs access instead of the source tables. This enables the organization to easily perform database refactorings such as Rename Column (page 109) or Rename Table (page 113) without impacting the external applications, because views add an encapsulation layer between your tables and your application.

• Reduce SQL duplication. When you have complex SQL queries in an application, it is common to discover that parts of the SQL are duplicated in many places. When this is the case, you should introduce views to extract out the duplicate SQL, as shown in the example section.

Potential Tradeoffs

There are two primary challenges with introducing a view. First, the performance of the joins defined by the view may not be acceptable to your end users, requiring a different approach such as the refactoring Introduce Read-Only Table (page 251). Second, the addition of a new view increases the coupling within your database schema—as you can see in Figure 11.4, the view definition depends on the table schema(s) definitions.

Figure 11.4. Introducing the CustomerPortfolio view.

Schema Update Mechanics

To update the database schema when you perform Introduce View, you must simply introduce the view via the CREATE VIEW command. Figure 11.4 depicts an example where the CustomerPortfolio view is based on the Customer, Account, and Insurance tables to summarize the business that we do with each customer. This provides an easier way for end users to do ad-hoc queries. The following code depicts the DDL to introduce the CustomerPortfolio view:

CREATE VIEW CustomerPortfolio (
  CustomerID
  Name
  PhoneNumber
  AccountsTotalBalance
  InsuranceTotalPayment
  InsuranceTotalValue
) AS SELECT
  Customer.CustomerID,
  Customer.Name,
  Customer.PhoneNumber,
  SUM(Account.Balance),
  SUM(Insurance.Payment),
  SUM(Insurance.Value)
  FROM
    Customer,Account,Insurance
  WHERE
    Customer.CustomerID=Account.CustomerID
    AND Customer.CustomerID=Insurance.CustomerID
;

Data-Migration Mechanics

There is no data to migrate with this database refactoring.

Access Program Update Mechanics

The update mechanics depend on the situation. If you are introducing the view to replace common SQL retrieval code within access programs, you must refactor that code accordingly to use the new view instead. If you are introducing the view for reporting purposes, new reports should be written to take advantage of that view. For programs accessing the view, the view looks like a read-only table.

When SQL code in your application is duplicated, there is a greater potential for bugs because when you change the SQL in one place, you must make similar changes to all of the duplicates. For example, let’s consider instances of application SQL code:

SELECT Customer.CustomerID, SUM(Insurance.Payment), SUM(Insurance.Value)
FROM Customer, Insurance
WHERE
  Customer.CustomerID=Insurance.CustomerID
  AND Customer.Status = 'ACTIVE'
  AND Customer.InBusinessSince <= TodaysDateLastYear
GROUP BY Customer.CustomerID
;

SELECT Customer.CustomerID, SUM(Account.Balance)
FROM Customer, Account
WHERE
  Customer.CustomerID=Account.CustomerID
  AND Customer.Status = 'ACTIVE'
  AND Customer.InBusinessSince <= TodaysDateLastYear
GROUP BY Customer.CustomerID

As you can see in the preceding two code examples, the part to select an active customer is duplicated. We can create a view that extracts this duplicated SQL into the view and you can use the view in these SQLs, so that when the SQL to select an active customer changes, you do not have to change it multiple times:

CREATE OR REPLACE VIEW ActiveCustomer
SELECT Customer.CustomerID
FROM Customer
WHERE
Customer.Status = 'ACTIVE'
AND Customer.InBusinessSince <= TodaysDateLastYear
;

SELECT ActiveCustomer.CustomerID,
SUM(Insurance.Payment), SUM(Insurance.Value)
FROM ActiveCustomer, Insurance
WHERE ActiveCustomer.CustomerID=Insurance.CustomerID
GROUP BY ActiveCustomer.CustomerID
;

SELECT ActiveCustomer.CustomerID, SUM(Account.Balance)
FROM ActiveCustomer, Account
WHERE ActiveCustomer.CustomerID=Account.CustomerID
GROUP BY ActiveCustomer.CustomerID
;

Update Data

Update data within an existing table.

Motivation

You may need to apply Update Data as the result of the following:

• Table reorganization. When you apply Rename Table (page 113), Rename Column (page 109), Move Column (page 103), Split Table (page 145), Split Column (page 140), Merge Tables (page 96), or Merge Columns (page 92), you may have to apply Update Data to reorganize the data in the existing tables.

• Provide data where none existed. When applying transformations such as Introduce New Column (page 301), you may need to provide data for the newly added column in your existing production databases. For example, if you added a Country column to the Address table, you would need to populate it with appropriate values.

• Change reference data. When business rules change, there is a need to change some of the reference/lookup data. For example, you need to change the AccountType.Name value to ‘Private Banking’ from ‘Private’ due to a change in business terminology.

• Support a column change. Refactorings such as Apply Standard Codes (page 157) and Apply Standard Types (page 162) often require an update to the values stored in the column. The first refactoring often consolidates the values being used within a column—for example “US”, “USA”, and “U.S.” are consolidated to the single value of “USA”. The second refactoring often requires a conversion of data values, from numeric to character for example.

• Fix transactional data. Because of defects in a deployed application or database, you may get invalid results that need to be changed as part of fixing the defect(s). For example, an application may have populated incorrect interest amount in the Charges table for a customer, based on data in the InterestRate table; as part of fixing this defect, you must update the InterestRate table and the Charges table to reflect the correct value.

Potential Tradeoffs

Updating data in tables can prove tricky, especially when you are going to update reference data. For example, assume the Account table references data in the AccountType table. The data that you update in AccountType must contain values that make sense for the data contained with Account. When you are updating data, you must ensure that you are updating only, and all of, the correct rows.

Schema Update Mechanics

The database schema does not change with this transformation.

Data-Migration Mechanics

When small amounts of data need to be updated, you will likely find that a simple SQL script that updates the target table(s) is sufficient. For large amounts of data to be updated, you need to take a more sophisticated approach, such as using an extract-transform-load (ETL) tool, particularly when the data is being derived based on complex algorithms from existing table data. Important issues to consider include the following:

• Is the source data being provided from existing application tables, or is the data being provided by your business users?

• Have the values been accepted by your project stakeholder(s)?

• Which rows need to be updated?

• Which columns of those rows?

• What dependencies exist?

Figure 11.5 depicts how we update in the AccountType table representing brokerage accounts. This AccountType supports new naming convention that needs to first be tested and then deployed into production later on. The following code depicts the DML to update data in the AccountType table:

Figure 11.5. Updating the AccountType table with new names.

UPDATE AccountType SET Name = 'Chequing'
WHERE AccountTypeID=1;

UPDATE AccountType SET Name = 'Private Banking'
WHERE AccountTypeID=3;

Access Program Update Mechanics

When Update Data is applied as the result of another database refactoring, the external programs should have been updated to reflect that refactoring. When Update Data is applied to support the change of data within an existing schema, it is possible that external code will need to be changed. First, you may need to update SELECT statements to have the correct values in the WHERE clause. For example, assume you update values into the AccountType table for the row where AccountType.Name is “Private” and is now defined as “Private Banking”.

Similarly, you may need to update source code that validates the values of data attributes. For example, you may have code that tries to validate whether the AccountType is “Private”. Now that code needs to change, and it needs to validate whether the AccountType is “Private Banking” in your logic.

The following view definition shows how the affected parts of the application need to change when the “Private” account type is changed to “Private Banking”:

//Before view
CREATE OR REPLACE VIEW PrivateAccounts AS
SELECT
  Account.AccountId,
  Account.CustomerId,
  Account.StartDate,
  Account.Balance,
  Account.isPrimary
FROM
  Account, AccountType
WHERE
  Account.AccountTypeId = AccountType.AccountTypeId
  AND AccountType.Name = 'Private'
;

//After view
CREATE OR REPLACE VIEW PrivateAccounts AS
SELECT
  Account.AccountId,
  Account.CustomerId,
  Account.StartDate,
  Account.Balance,
  Account.isPrimary
FROM
  Account, AccountType
WHERE
  Account.AccountTypeId = AccountType.AccountTypeId
  AND AccountType.Name = 'Private Banking'
;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 11: Transformations

Create new playlist

Sign In

Sign Up

Chapter 11Transformations

Insert Data

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Introduce New Column

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Introduce New Table

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Introduce View

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Update Data

Motivation

Potential Tradeoffs

Schema Update Mechanics

Data-Migration Mechanics

Access Program Update Mechanics

Table of Contents for
Chapter 11: Transformations

Chapter 11
Transformations