Chapter 5
Database Refactoring Strategies

Knowing more today than yesterday is good news about today, not bad news about yesterday.

—Ron Jeffries

This chapter describes some of our experiences with database refactoring on actual projects, and suggests a few potential strategies that you may want to consider. In many ways, this chapter summarizes a collection of “lessons learned” that we hope will help your adoption efforts. These lessons include the following:

• Smaller changes are easier to apply.

• Uniquely identify individual refactorings.

• Implement a large change by many small ones.

• Have a database configuration table.

• Prefer triggers over views or batch synchronization.

• Choose a sufficient deprecation period.

• Simplify your database change control board (CCB) strategy.

• Simplify negotiations with other teams.

• Encapsulate database access.

• Be able to easily set up a database environment.

• Do not duplicate SQL.

• Put database assets under change control.

• Beware of politics.

5.1 Smaller Changes Are Easier to Apply

It is tempting to try to make several changes to your database at once. For example, what is stopping you from moving a column from one table to another, renaming it, and applying a standard type to it all at the same time? Absolutely nothing, other than the knowledge that doing this all at once is harder, and therefore riskier, than doing it one step at a time. If you make a small change and discover that something is broken, you pretty much know which change caused the problem.

5.2 Uniquely Identify Individual Refactorings

During a software development project, you are likely to apply hundreds of refactorings and/or transformations to your database schema. Because these refactorings often build upon each other—for example, you may rename a column and then a few weeks later move it to another table—you need to ensure that the refactorings are applied in the right order. To do this, you need to identify each refactoring somehow and identify any dependencies between them. Table 5.1 compares and contrasts the three basic strategies for doing so. The strategies in Table 5.1 assume that you are working in a single-application single-database environment.

Table 5.1. Version Identification Strategies

image

When you are in a multi-application environment in which several project teams may be applying refactorings to the same database schema, you also need to find a way to identify which team produced a refactoring. The easiest way to do this is to assign a unique identifier to each team and then include that value as part of the refactoring identifier. Therefore, with a build number strategy, team 1 might have refactorings with IDs 1-7, 1-12, 1-15, and so on; and team 7 could have refactorings with IDs 7-3, 7-7, 7-13, and so on.

Our experience is that when a single team is responsible for a database, the build number strategy works best. However, when several teams can evolve the same database, a date/timestamp approach works best because you can readily tell in which order the refactorings were applied from the date/timestamp. With a build number approach, you cannot—for example, determine which refactoring comes first, refactoring 1-7 or 7-7?

It is not, however, as simple as applying refactorings in order. Scott worked in an organization in which four separate teams could evolve the same database schema. There were two database administrators (DBAs)—we will call them Fred and Barney—to support the teams that worked closely together. Although we tried to coordinate their efforts, and most of the time succeeded, mistakes would happen. One time Fred applied the refactoring Apply Standard Codes (page 157) to a column, and a few days later Barney applied the same refactoring on a different team, but used a different set of “standard” values. As it turned out, Barney’s “standard” values was the right set, but we did not find that out until the two teams promoted their changes into the preproduction testing environment and effectively clobbered one another. The point to be made is that in a multiple-team environment, you need a coordination strategy. (Several are discussed later in this chapter.)

5.3 Implement a Large Change by Many Small Ones

Large changes to your database, such as implementing a common surrogate key strategy across all tables or applying a consistent naming strategy throughout your database, should be implemented as a collection of small refactorings. This strategy follows the old adage, “How do you eat an elephant? One bite at a time.”

Consider splitting an existing table in two. Although we have a single refactoring called Split Table (page 145), the reality is that in practice you need to apply many refactorings to get this done. For example, you need to apply the Introduce New Table (page 304) transformation to add the new table, the Move Column (page 103) refactoring several times (one for each column) to move, and potentially the Introduce Index (page 248) refactoring to implement the primary key of the new table. To implement each of the Move Column refactorings, you must apply the Introduce New Column (page 301) transformation and the Move Data (page 230) transformation. When doing this, you may discover that you need to apply one or more data quality refactorings (Chapter 7, “Data Quality Refactorings”) to improve the source data in the individual columns.

5.4 Have a Database Configuration Table

Chapter 3, “The Process of Database Refactoring,” discussed the need to identify the current schema version of the database to enable you to update the schema appropriately. This schema version should reflect your database refactoring strategy; for example, if you identify refactorings using a date/timestamp strategy, you should identify the current schema version with a date/timestamp, too. The easiest way to do that is to have a table that maintains this information. In the following code, we create a single-row, single-column table called DatabaseConfiguration that reflects a build number strategy:

CREATE TABLE DatabaseConfiguration
 (SchemaVersion NUMBER NOT NULL);

INSERT INTO DatabaseConfiguration
 ( SchemaVersion ) VALUES (0);


This table is updated with the identifier value of a database refactoring whenever the refactoring is applied to the database. For example, when you apply refactoring number 17 to the schema, DatabaseConfiguration.SchemaVersion, would be updated to 17, as shown in the following code:

UPDATE DatabaseConfiguration
 SET SchemaVersion = 17;


5.5 Prefer Triggers over Views or Batch Synchronization

In Chapter 2, “Database Refactoring,” you learned that when several applications access the same database schema, you often require a transition period during which both the original and new schemas exist in production. You need a way to ensure that regardless of which version of the schema an application accesses, it accesses consistent data. Table 5.2 compares and contrasts the three strategies that you may use to keep the data synchronized. Our experience is that triggers are the best approach for the vast majority of situations. We have used views a couple of times and have taken a batch approach rarely. All of the examples throughout this book assume a trigger-based synchronization approach.

Table 5.2. Schema Synchronization Strategies

image

5.6 Choose a Sufficient Transition Period

The DBA must assign a realistic transition period to the refactoring that is sufficient for all the other application teams. We have found that the easiest approach is to agree on a common transition period for the various categories of refactoring and then apply that transition period consistently. For example, structural refactorings may have a two-year transition period, whereas architecture refactorings may have a three-year transition period. The primary disadvantage is that this approach requires you to adopt the longest transition periods, even when you are refactoring schema that is accessed by a handful of applications that are deployed frequently. You can mitigate this problem by actively removing schema within your production databases that are no longer required, even though the transition period may not have expired yet, or by negotiating a shorter transition period via a database change control board or through direct negotiation with the other teams.

5.7 Simplify Your Database Change Control Board (CCB) Strategy

Scott worked with one company in which they had a database CCB comprised of the operational DBAs, people who understood the enterprise data assets well. This CCB met once a week, and at their meetings the project DBAs would bring a list of suggested changes that their teams wanted to make to existing production data sources. (The teams were free to change data sources not yet in production.) The CCB would determine whether the change would be allowed and if so what the deprecation period would be. The advantage is that there is the perception of tighter control on the part of the CCB. (Of course, this strategy completely falls apart if a development team chooses to go around them.) It has the disadvantage that it slows down the development efforts of the teams. Even when you can get a decision within a few hours or days, that is still time during which the development team has to tolerate the original schema. Our suggestion is to either give the project DBA the authority to make the changes to the database schema as needed, with the understanding that the CCB may later decide to override the change, or to have the CCB meet daily to discuss changes.

5.8 Simplify Negotiations with Other Teams

An alternative strategy for defining the transition period is to negotiate individual refactorings with the owners of any system that might be affected by the database refactoring. You could do this one time for each refactoring, or in batch during a regular database schema change negotiation meeting. This approach has the advantages that it will help to communicate the potential changes to everyone affected and will likely result in the most accurate transition period (assuming that everyone at the meeting can accurately predict when they can deploy their required changes). The primary disadvantage is that this approach might be slow and arduous. We have never seen this tried in practice. If you do try it, however, we suggest that you keep things as simple as possible.

5.9 Encapsulate Database Access

In Chapter 2, we argued that the more database access is encapsulated, the easier it is to refactor. Minimally, even if your application contains hard-coded SQL, you should at least strive to put that SQL code in one place so that you can easily find and update it when you need to. You could implement the SQL logic in a consistent manner, such as having save(), delete(), retrieve(), and find() operations for each business class. Or you could implement data access objects (DAOs), classes that implement the data access logic separately from business classes. For example, your Customer and Account business classes would have CustomerDAO and AccountDAO classes respectively. Better yet, you could forego SQL code completely by generating your database access logic from mapping meta data (Ambler 2003).

5.10 Be Able to Easily Set Up a Database Environment

People join, and eventually leave, your project throughout its life cycle. As you see in Figure 5.1, your team(s) will need to be able to create instances of your database, often with different versions of the schema on a range of machines, as you learned in Chapter 4, “Deploying into Production.” The most effective way to do this is with an installation script that applies the initial DDL to create the database schema and any applicable change scripts, and then runs your regression test suite to ensure that the installation was successful.

Figure 5.1. Sandboxes.

image

5.11 Do Not Duplicate SQL

One of the great things about SQL is that it is fairly easy to write. Unfortunately, because it is fairly easy to write, we have found that SQL code is often duplicated throughout an application, and even throughout a database within its views, stored procedures, and triggers. The more SQL code you have, the harder it is to refactor your database schema, because there will potentially be more code coupled to whatever you are refactoring. It is best if you write SQL in a single package or class, have the SQL generated from meta data, or store the SQL code in XML files that is accessed at runtime.

5.12 Put Database Assets Under Change Control

Chapter 1, “Evolutionary Database Development,” described how important it is to put all database assets, such as data models and database scripts, under change control management. Both of us have been involved with project teams where the DBAs, and sometimes even the developers, did not do this. As you would expect, these teams often struggled to identify the proper version of the data model, or of a change script, when it came time to deploy their applications into preproduction testing and sometimes even into production. Your database assets, just like the rest of your critical project assets, should be managed effectively. We have found it most helpful when database assets are co-located in the same repository as the application, enabling us to see who made changes and supporting rollback capabilities.

5.13 Beware of Politics

Introducing evolutionary database techniques, and in particular database refactoring, is likely to be a major change within your organization. As Manns and Rising (2005) point out in Fearless Change, politics is likely to rear its ugly head as you try to make these improvements to the way that you work. Although many IT professionals prefer to ignore politics, naively believing that they are above it, they do so at their peril. The techniques that we describe in this book work; we know this because we have done it. We also know that many traditional data professionals are threatened by these techniques, and rightfully so, because these techniques force them to make significant changes to the way that they will work in the future. You will likely need to “play the political game” if you want to adopt database refactoring within your organization.

5.14 What You Have Learned

Database refactoring is a relatively new development technique, but you can still learn from the experiences of other people. In this chapter, we shared some of our experiences and suggested a few new strategies that you may want to try.

Online Resources

We administer the Agile Databases mailing list at Yahoo Groups. The URL for the group is groups.yahoo.com/group/agileDatabases/. We invite you to get involved with the discussions and share your experiences.

We also maintain both www.databaserefactoring.com and www.agiledata.org where up-to-date lists of database refactorings are maintained. New refactorings will be added as they are discovered.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset