5 Data Development

Data development is the third Data Management Function in the data management framework shown in Figures 1.3 and 1.4. It is the second data management function that interacts with and is influenced by the Data Governance function. Chapter 5 defines the data development function and explains the concepts and activities involved in data development.

5.1 Introduction

Data development is the analysis, design, implementation, deployment, and maintenance of data solutions to maximize the value of the data resources to the enterprise. Data development is the subset of project activities within the system development lifecycle (SDLC) focused on defining data requirements, designing the data solution components, and implementing these components. The primary data solution components are databases and other data structures. Other data solution components include information products (screens and reports) and data access interfaces.

The context of the Data Development Function is shown in the context diagram in Figure 5.1

Project team members must collaborate with each other for effective solution design.

  • Business data stewards and subject matter experts (SMEs) provide business requirements for data and information, including business rules and data quality expectations, and then validate that these requirements have been met.
  • Data architects, analysts, and database administrators have primary responsibility for database design. Database administrators collaborate with software developers to define data access services in layered service-oriented architecture (SOA) implementations.
  • Software architects and developers (both application and data integration specialists) take primary responsibility for data capture and usage design within programs, as well as the user interface design for information products (screens and printed reports).

5.2 Concepts and Activities

The activities necessary to carry out the data development function are described below.

5.2.1 System Development Lifecycle (SDLC)

Data development activities occur in the context of systems development and maintenance efforts, known as the system development life cycle (SDLC). Projects manage most of these efforts. A project is an organized effort to accomplish something. A very small maintenance effort may be completed in a day. Very large multi-phase projects can take years to complete.

Figure 5.1 Data Development Context Diagram

System development and maintenance projects perform selected activities within the systems development lifecycle. The stages of the SDLC represent very high-level steps commonly taken to implement systems, as shown in Figure 5.2. There is no standardized outline of these stages, but in general, the SDLC includes the following specification and implementation activities:

  • Project Planning , including scope definition and business case justification.
  • Requirements Analysis.
  • Solution Design.
  • Detailed Design.
  • Component Building.
  • Testing, including unit, integration, system, performance, and acceptance testing.
  • Deployment Preparation, including documentation development and training.
  • Installation and Deployment, including piloting and rollout.

Figure 5.2 The System Development Lifecycle (SDLC)

System maintenance efforts also generally follow the same high-level SDLC processes in very rapid sequence, performing some small amounts of analysis, design, coding, testing, and deployment.

Many organizations have adopted SDLC methods that integrate systems development methods and techniques into a comprehensive approach to systems development. Methods guide system development project planning and performance. Most methods recommend detailed tasks and specific techniques to perform activities within each SDLC stage. These tasks and techniques create a data modeling series of deliverables leading ultimately to an implemented system. The outputs from early tasks serve as the inputs guiding subsequent tasks.

Different methods portray the SDLC in different ways, each with its own distinctive use of terms. Some methods define a waterfall approach to performing SDLC stages. Some methods define a spiral, iterative approach. These methods deliver complete solutions in increments by performing SDLC stages in multiple project phases, guided by some high-level planning, analysis, and design.

Information systems capture and deliver information (data in context with relevance and a time frame) to support business functions. These functions range from strategic planning to operational performance. Data stores and information products are integral components of every information system. An effective systems development project will maintain a balanced emphasis on data, process, and technology.

5.2.2 Styles of Data Modeling

Several different data modeling methods are available, each using different diagramming conventions or styles. The syntax for each of these styles differs slightly. While all data models use boxes and lines, each style uses different symbols and box contents to communicate detailed specifications. The DAMA-DMBOK Guide offers only a very brief introduction to these styles.

  • IE: The most common data modeling diagramming style is the “information engineering” (IE) syntax, so named because it was popularized by James Martin in his influential books and training on Information Engineering. The IE notation uses tridents or “crow’s feet”, along with other symbols, to depict cardinality.
  • IDEF1X: This is an alternate data modeling syntax developed originally for use by the U.S. Air Force, using circles (some darkened, some empty) and lines (some solid, some dotted) instead of “crow’s feet” to communicate similar meanings. IDEF0 process diagrams often use IDEF1X notation.
  • ORM: Object Role Modeling is an alternate modeling style with a syntax that enables very detailed specification of business data relationships and rules. ORM diagrams present so much information that effective consumption usually requires smaller subject area views, with fewer business entities on a single diagram. ORM is not widely used, but its proponents strongly advocate its benefits. ORM is particularly useful for modeling complex business relationships.
  • UML: The Unified Modeling Language is an integrated set of diagramming conventions for several different forms of modeling. Grady Booch, Ivar Jacobsen, and James Rumbaugh developed UML to standardize object-oriented analysis and design. UML has become widely adopted, effectively achieving this purpose. UML is now widely used in many SDLC methods and has been adopted by many standards organizations.

UML defines several different types of models and diagrams. Class diagrams closely resemble other data model styles. In addition to modeling object-oriented software, semantic models for XML-based web services commonly use UML class diagrams. In fact, conceptual, logical, and even physical data modeling can use UML class diagrams.

Some practitioners see no need or value to modeling objects and data separately. Conceptual object class models are equivalent to conceptual data models. However, logical and physical data models usually differ substantially from logical and physical object-oriented program designs. Logical data models normalize data attributes, while object models do not. The attributes of an object represent data in program memory, while the attributes of a physical data model represent the data stored in a database, usually as columns in relational database tables. Recognizing these differences, most data professionals prefer to model data and / or databases in separate models with different diagramming styles.

When used consistently, the different diagramming conventions can quickly differentiate and communicate the purpose of each model. For example, some practitioners use IE notation for logical data modeling and use IDEF1X for physical data modeling, especially dimensional modeling. However, this is confusing for business data stewards reviewing different kinds of models. Data stewards do not need to become data modelers, but they should be fluent in reading and interpreting one primary diagramming convention.

5.2.3 Data Modeling, Analysis, and Solution Design

Data modeling is an analysis and design method used to 1) define and analyze data requirements, and 2) design data structures that support these requirements. A data model is a set of data specifications and related diagrams that reflect data requirements and designs. For the most part, conceptual data modeling and logical data modeling are requirements analysis activities, while physical data modeling is a design activity.

A model is a representation of something in our environment. It makes use of standard symbols that allow one quickly to grasp its content. Maps, organization charts, and building blueprints are examples of models in use every day. Think of a data model as a diagram that uses text and symbols to represent data elements and relationships between them. In fact, a single diagram may be one of several views provided for a single integrated data model. More formally, a data model is the integrated collection of specifications and related diagrams that represent data requirements and designs.

Although there are well-defined techniques and processes, there is an art to making data available in usable forms to a variety of different applications, as well as visually understandable. Data modeling is a complex process involving interactions between people and with technology, which do not compromise the integrity or security of the data. Good data models accurately express and effectively communicate data requirements and quality solution design. Some model diagrams try to communicate too much detail, reducing their effectiveness.

Two formulas guide a modeling approach:

  • Purpose + audience = deliverables.
  • Deliverables + resources + time = approach.

The purpose of a data model is to facilitate:

  • Communication: A data model is a bridge to understanding data between people with different levels and types of experience. Data models help us understand a business area, an existing application, or the impact of modifying an existing structure. Data models may also facilitate training new business and / or technical staff.
  • Formalization: A data model documents a single, precise definition of data requirements and data related business rules.
  • Scope: A data model can help explain the data context and scope of purchased application packages.

Data models that include the same data may differ by:

  • Scope: Expressing a perspective about data in terms of function (business view or application view), realm (process, department, division, enterprise, or industry view), and time (current state, short-term future, long-term future).
  • Focus: Basic and critical concepts (conceptual view), detailed but independent of context (logical view), or optimized for a specific technology and use (physical view).

Use data models to specify the data required to meet information needs. Data flows through business processes packaged in information products. The data contained in these information products must meet business requirements. Data modeling is, in that sense, an analysis activity, reflecting business requirements. However, data modeling presents creative opportunities at every step, making it, at the same time, a design activity. Generally, there is more analysis involved in conceptual data modeling, and more design involved in physical data modeling, with a more balanced mixture of both in logical data modeling.

5.2.3.1 Analyze Information Requirements

Information is data in context that has relevance, and is timely. To identify information requirements, we need to first identify business information needs, often in the context of one or more business processes. Business processes consume as input, information products output from other business processes. The names of these information products often identify an essential business vocabulary that serves as the basis for data modeling. Regardless of whether processes or data are modeled sequentially (in either order), or concurrently, effective analysis and design should ensure a relatively balanced view of data (nouns) and processes (verbs), with equal emphasis on both process and data modeling.

Projects typically begin with a project request and the definition of a project charter that defines project objectives, deliverables, and scope boundaries. Initial project plans estimate the resources, effort, time, and cost required to accomplish project objectives. Every project charter should include data-specific objectives and identify the data within its scope. Reference to an enterprise data model provides the vocabulary to define the data scope of the project effectively.

Requirements analysis includes the elicitation, organization, documentation, review, refinement, approval, and change control of business requirements. Some of these requirements identify business needs for data and information. Express requirement specifications in both words and diagrams.

Logical data modeling is an important means of expressing business data requirements. For many people, as the old saying goes, “a picture is worth a thousand words.” However, some people do not relate easily to pictures; they relate better to reports and tables created by data modeling tools. Many organizations have formal requirements - management disciplines to guide drafting and refining formal requirement statements, such as, “The system shall …”. Written data requirement specification documents may be maintained using requirements management tools. Carefully synchronize the contents of any such documentation with the specifications captured within data models.

Some methods include enterprise planning activities that define the enterprise data model, using techniques such as business systems planning (BSP) or information systems planning. Methods may also include the definition of related enterprise-wide data delivery architecture in the planning phase. Chapter 4 on Data Architecture Management covers these activities.

5.2.3.2 Develop and Maintain Conceptual Data Models

A conceptual data model is a visual, high-level perspective on a subject area of importance to the business. It contains only the basic and critical business entities within a given realm and function, with a description of each entity and the relationships between entities. Conceptual data models define the semantics (nouns and verbs) of the essential business vocabulary. Conceptual data model subject areas may reflect the data associated with a business process or application function. A conceptual data model is independent of technology (database, files, etc.) and usage context (whether the entity is in a billing system or a data warehouse).

Included in a conceptual data model is a glossary that defines each object within the conceptual data model. The definitions include business terms, relationship terms, entity synonyms, and security classifications. An example of a conceptual data model is shown in Figure 5.3.

Figure 5.3 Conceptual Data Model Example

To create a conceptual data model, start with one subject area from the subject area model. Determine what objects are included within that subject area, and how they relate to each other. For example, a Customer subject area may contain the following entities: Account Owner, Sub Account, Contact Preferences, and Contact Information. One Account Owner relates to one or more Sub Accounts. Each Account Owner has one set of Contact Preferences and one set of Contact Information at any time.

To maintain a conceptual data model, adopt a process to check any proposed changes to the production system against the conceptual model. If a project will involve changes, create an intermediate conceptual model and make the changes there. Copy the model changes to the production version of the conceptual model when implementing changes to the production system as part of the release process, to ensure that the model keeps in synch with current reality.

5.2.3.2.1 Entities

A business entity is something of interest to the organization, an object, or an event. A data entity is a collection of data about something that the business deems important and worthy of capture. An entity is a noun:

  • A who: Person, organization, role, employee, customer, vendor, student, party, department, regulatory body, competitor, partner, subsidiary, team, family, household.
  • A what: Product, service, resource, raw material, finished good, course, class.
  • A when: Event, fiscal period.
  • A where: Location, address, site, network node.
  • A why: Policy, rule, request, complaint, return, inquiry.
  • A how: Mechanism, tool, document, invoice, contract, agreement, standard, account.

An entity occurrence is the instantiation of a particular business entity. The entity Customer can have instances named Bob, Joe, Jane, and so forth. The entity Account can have instances of Bob’s checking account, Bob’s savings account, Joe’s brokerage account, and so on.

An entity can appear in a conceptual or logical data model. Conceptual business entities describe the things about which we collect data, such as Customer, Product, and Account. Logical data entities follow the rules of normalization and abstraction, and therefore the concept Customer becomes numerous components such as Customer, Customer Type, and Customer Preference. Physical data models define tables that may or may not relate directly to entities in a comparable logical model.

Entities are either independent or dependent entities. An independent entity (or kernel entity) does not depend on any other entity for its existence. Each occurrence of an independent entity exists without referring to any other entity in the data model. A dependent entity depends on one or more other entities for its existence. There are three main types of dependent entity:

  • Attributive / characteristic entity: An entity that depends on only one other parent entity, such as Employee Beneficiary depending on Employee.
  • Associative / mapping entity: An entity that depends on two or more entities, such as Registration depending on a particular Student and Course.
  • Category / sub-type or super-type entity: An entity that is “a kind of” another entity. Sub-types and super-types are examples of generalization and inheritance. A super-type entity is a generalization of all its subtypes, and each sub-type inherits the attributes of their super-type. For example, a Party super-type links to Person and Organization sub-types. Subtypes may be over-lapping (non-exclusive) or non-overlapping (exclusive). A non-overlapping sub-type entity instance must be either one sub-type or another, but not both.

5.2.3.2.2 Relationships

Business rules define constraints on what can and cannot be done. Business rules divide into two major categories:

  • Data rules constrain how data relates to other data. For example, “Freshman students can register for at most 18 credits a semester.” Data models focus on data business rules.
  • Action rules are instructions on what to do when data elements contain certain values. Action rules are difficult to define in a data model. Business rules for data quality are action rules, and applications implement them as data entry edits and validations.

Data models express two primary types of data rules:

  • Cardinality rules define the quantity of each entity instance that can participate in a relationship between two entities. For example, “Each company can employ many persons.”
  • Referential integrity rules ensure valid values. For example, “A person can exist without working for a company, but a company cannot exist unless at least one person is employed by the company.”

Express cardinality and referential integrity business rules as relationships between entities in data models. Combine the examples above to express the relationship between Company and Person as follows:

  • Each person can work for zero to many companies.
  • Each company must employ one or many persons.

Relationship labels are verb phrases describing the business rules in each direction between two entities, along with the words that describe the “many” aspect of each relationship (cardinality) and the “zero or one” side of each relationship (referential integrity).

A relationship between two entities may be one of three relationship types:

  • A one-to-one relationship says that a parent entity may have one and only one child entity.
  • A one-to-many relationship says that a parent entity may have one or more child entities. One-to-many relationships are the most common relationships. In some one-to-many relationships, a child entity must have a parent, but in other relationships, the relationship to a parent is optional. In some one-to-many relationships, a parent entity must have at least one child entity, while in other one-to-many relationships, the relationship to any child is optional.
  • A many-to-many relationship says that an instance of each entity may be associated with zero to many instances of the other entity, and vice versa.

A recursive relationship relates instances of an entity to other instances of the same entity. Recursive relationships may be one-to-one, one-to-many, or many-to-many.

5.2.3.3 Develop and Maintain Logical Data Models

A logical data model is a detailed representation of data requirements and the business rules that govern data quality, usually in support of a specific usage context (application requirements). Logical data models are still independent of any technology or specific implementation technical constraints. A logical data model often begins as an extension of a conceptual data model, adding data attributes to each entity. Organizations should have naming standards to guide the naming of logical data objects. Logical data models transform conceptual data model structures by applying two techniques: normalization and abstraction. An example of a logical data model is shown in Figure 5.4.

Normalization is the process of applying rules to organize business complexity into stable data structures. A deeper understanding of each data element is required, to see each data element in relationship to every other data element. The basic goal of normalization is to keep each data element in only one place.

Normalization rules sort data elements according to primary and foreign keys. Normalization rules sort into levels, with each level applying more granularity and specificity in search of the correct primary and foreign keys. Each level comprises a separate normal form, and each successive level includes previous levels. Normalization levels include:

  • First normal form (1NF): Ensures each entity has a valid primary key, every data element depends on the primary key, and removes repeating groups, and ensuring each data element is atomic (not multi-valued).
  • Second normal form (2NF): Ensures each entity has the minimal primary key and that every data element depends on the complete primary key.
  • Third normal form (3NF): Ensures each entity has no hidden primary keys and that each data element depends on no data element outside the key (“the key, the whole key and nothing but the key”).

Figure 5.4 Logical Data Model Example

  • Boyce / Codd normal form (BCNF): Resolves overlapping composite candidate keys. A candidate key is either a primary or an alternate key. ‘Composite’ means more than one (e.g. two data elements in an entity’s primary key), and ‘overlapping’ means there are hidden business rules between the keys.
  • Fourth normal form (4NF): Resolves all many-to-many-to-many relationships (and beyond) in pairs until they cannot be broken down into any smaller pieces.
  • Fifth normal form (5NF): Resolves inter-entity dependencies into basic pairs, and all join dependencies use parts of primary keys.
  • Sixth normal form (6NF): Adds temporal objects to primary keys, in order to allow for historical reporting and analysis over timeframes.

The term normalized model usually means the data is in 3NF. Situations requiring BCNF, 4NF, 5NF, and 6NF occur rarely; these forms are considered advanced topics in data modeling.

Abstraction is the redefinition of data entities, elements, and relationships by removing details to broaden the applicability of data structures to a wider class of situations, often by implementing super-types rather than sub-types. Using the generic Party Role super-type to represent the Customer, Employee, and Supplier sub-types is an example of applying abstraction.

Use normalization to show known details of entities. Use abstraction when some details of entities are missing or not yet discovered, or when the generic version of entities is more important or useful than the subtypes.

5.2.3.3.1 Attributes

An attribute is a property of an entity; a type of fact important to the business whose values help identify or describe an entity instance. For example, the attribute Student Last Name describes the last name of each student. Attributes translate in a physical data model to a field in a file or a column in a database table. Attributes use business names, while fields and columns use technical names that frequently include technical abbreviations. In a logical data model, business entities represent the essential nouns in the organization’s vocabulary, and attributes represent adjectives.

An attribute in a logical model should be atomic. It should contain one and only one piece of data (fact) that cannot be divided into smaller pieces. For example, a conceptual data element called phone number divides into several logical data elements for phone type code (home, office, fax, mobile, etc.), country code, (1 for US and Canada), area code, prefix, base phone number, and extension.

An instance of an attribute is the value of the attribute for a particular entity instance. An occurrence of a data value is its appearance as an attribute instance for an entity instance. The data element instance 60106 for example, belongs to the Customer Employee Zip Code data element, which exists for the Customer instance Bob.

Entity and attribute definitions are essential contributors to the business value of any data model. High-quality definitions clarify the meaning of business vocabulary and provide rigor to the business rules governing entity relationships. High-quality definitions assist business professionals in making intelligent business decisions, and they assist IT professionals in making intelligent application design decisions. High-quality data definitions exhibit three essential characteristics: clarity, accuracy, and completeness.

5.2.3.3.2 Domains

The complete set of all possible values for an attribute is a domain. An attribute can never contain values outside of its assigned domain. Some domains have a limited number of specific defined values, or minimum or maximum limits for numbers. Business rules can also restrict domains.

Attributes often share the same domain. For example, an employee hire date and a purchase order date must be:

  • A valid calendar date (for example, not February 31st).
  • A date that falls on a weekday.
  • A date that does not fall on a holiday.

A data dictionary contains a collection of domains and the attributes that relate to each domain, among other things.

5.2.3.3.3 Keys

Attributes assigned to entities are either key or non-key attributes. A key data element helps identify one unique entity instance from all others, either fully (by itself) or partially (in combination with other key elements). Non-key data elements describe the entity instance but do not help uniquely identify it.

A key (or candidate key) represents the one or more attributes whose values uniquely identify an entity instance. A composite key is a key containing two or more attributes. One of these candidate keys becomes the primary key. There should be only one primary key. All other candidate keys become alternate keys.

To avoid using composite primary keys, or key attributes with values that change over time, use a surrogate key. A surrogate key contains a randomly generated value uniquely assigned to an entity instance. ‘Surrogate’ means ‘substitute’. Use a surrogate key when a truly unique data element or set of data elements exists within the entity. Other names for surrogate keys are anonymous keys, or non-intelligent keys. Note that simply having a key generated by sequence number actually still has some intelligence. A person can tell in which order the rows were inserted into the table by the sequence, similar to a row number. True surrogate keys are random, not sequential.

A foreign key is an attribute that provides a link to another entity. Simply put, a foreign key is an attribute that appears in both entities in a relationship, and partially or fully identifies either one or both of the entities. When a one-to-many relationship exists between two entities, the entity on the child side of the relationship inherits the primary key attributes from the entity on the parent side of the relationship. The foreign key enables navigation between data structures.

An identifying relationship occurs when the foreign key attribute(s) of a parent entity appears as part of the composite primary key of a child entity. A non-identifying relationship occurs when the foreign key of a parent entity is a non-key attribute(s) describing the child entity.

5.2.3.4 Develop and Maintain Physical Data Models

A physical data model optimizes the implementation of detailed data requirements and business rules in light of technology constraints, application usage, performance requirements, and modeling standards. Design relational databases with the specific capabilities of a database management system in mind (IBM DB2 or UDB, Oracle, Teradata, Sybase, or Microsoft SQL Server or Access). Organizations should have naming standards to guide the naming of physical data objects. An example of a physical data model is shown in Figure 5.5.

Figure 5.5 Physical Data Model Example

Physical data model design includes making decisions about:

  • The technical name of each table and column (relational databases), or file and field (non-relational databases), or schema and element (XML databases).
  • The logical domain, physical data type, length, and nullability of each column or field.
  • Any default values for columns or fields, especially for NOT NULL constraints.
  • Primary and alternate unique keys and indexes, including how to assign keys.
  • Implementation of small reference data value sets in the logical model, such as a) separate code tables, b) a master shared code table, or c) simply as rules or constraints.
  • Implementation of minor supertype / subtype logical model entities in the physical database design where the sub-type entities’ attributes are merged into a table representing the super-type entity as nullable columns, or collapsing the super-type entity’s attributes in a table for each sub-type.

Going forward, we will use the term ‘tables’ to refer to tables, files, and schemas; the term ‘columns’ to refer to columns, fields, and elements; and the term ‘rows’ to refer to rows, records, or instances.

Physical data modeling transforms the logical data model using several techniques, including:

  • Denormalization: Selectively and justifiably violating normalization rules, re-introducing redundancy into the data model to reduce retrieval time, potentially at the expense of additional space, additional insert / update time, and reduced data quality.
  • Surrogate keys: Substitute keys not visible to the business.
  • Indexing: Create additional index files to optimize specific types of queries.
  • Partitioning: Break a table or file vertically (separating groups of columns) or horizontally (separating groups of rows).
  • Views: Virtual tables used to simplify queries, control data access, and rename columns, without the redundancy and loss of referential integrity due to de-normalization.
  • Dimensionality: Creation of fact tables with associated dimension tables, structured as star schemas and snowflake schemas, for business intelligence (see Chapter 9).

5.2.4 Detailed Data Design

Detailed data design activities include:

  • Detailed physical database design, including views, functions, triggers, and stored procedures.
  • Other supporting data structures, such as XML schemas and object classes.
  • Information products, such as the use of data in screens and reports.
  • Data access solutions, including data access objects, integration services, and reporting and analysis services.

Database administrators (DBAs) take the lead role in database design, and a collaborative role in designing information products (XML schemas, messages, screens, and reports) and related data services (data access services, data integration services, and business intelligence services). Data analysts take the lead role in designing information products and related data services, and a collaborative role in database design.

5.2.4.1 Design Physical Databases

Detailed design includes database implementation specifications. A physical database design may take advantage of the unique functions and capabilities of a specific database management system, which may or may not be included in the data model itself.

For relational databases, the primary design deliverables are the Data Definition Language (DDL) specifications. DDL is a subset of Structured Query Language (SQL) used to create tables, indexes, views, and other physical database objects. For XML databases, the primary design deliverable is the namespace.

A complete, high-quality database design document is more than just DDL statements. Section 5.2.4.1.3 describes a complete physical design document.

Whether or not the DBA collaborates in physical data modeling, the DBA has primary responsibility for detailed database design, including:

  • Ensuring the design meets data integrity requirements.
  • Determining the most appropriate physical structure to house and organize the data, such as relational or other type of DBMS, files, OLAP cubes, XML, etc.
  • Determining database resource requirements, such as server size and location, disk space requirements, CPU and memory requirements, and network requirements.
  • Creating detailed design specifications for data structures, such as relational database tables, indexes, views, OLAP data cubes, XML schemas, etc.
  • Ensuring performance requirements are met, including batch and online response time requirements for queries, inserts, updates, and deletes.
  • Designing for backup, recovery, archiving, and purge processing, ensuring availability requirements are met, and database maintenance operations can be performed within the window(s) of time available (see Chapter 6).
  • Designing data security implementation, including authentication, encryption needs, application roles, and the data access and update permissions they should be assigned. The general rule is never to grant permissions on database objects to individual users, only to roles. Users can then be moved into and out of roles as needed; this greatly reduces maintenance and enhances data security (see Chapter 7).
  • Determine partitioning and hashing schemes, where appropriate.
  • Requiring SQL code review to ensure that the code meets coding standards and will run efficiently.

5.2.4.1.1 Physical Database Design

Choose a database design based on both a choice of architecture and a choice of technology. Base the choice of architecture (for example, relational, hierarchical, network, object, star schema, snowflake, cube, etc.) on several considerations:

  • Whether (and how often) the data is updated.
  • The natural organization of the data.
  • How the data is viewed and used.

The choice of implementation technology (for example, relational, XML, OLAP, or Object technology) may be governed by many different factors, including how long the data needs to be kept, whether it must be integrated with other data or passed across system or application boundaries, and on requirements of data security, integrity, recoverability, accessibility, and reusability.

There may also be organizational or political factors, including organizational biases and developer skill sets, that lean toward a particular technology or vendor. Other factors influencing physical database design include:

  • Purchase and licensing requirements, including the DBMS, the database server, and any client-side data access and reporting tools.
  • Auditing and privacy requirements (e.g., Sarbanes-Oxley, PCI, HIPAA, etc.).
  • Application requirements; for example, whether the database must support a web application or web service, or a particular analysis or reporting tool.
  • Database service level agreements (SLAs).

Database designers must find the answers to several questions, including:

  • What are the performance requirements? What is the maximum permissible time for a query to return results, or for a critical set of updates to occur?
  • What are the availability requirements for the database? What are the window(s) of time for performing database operations? How often should database backups and transaction log backups be done (i.e., what is the longest period of time we can risk non-recoverability of the data)?
  • What is the expected size of the database? What is the expected rate of growth of the data? At what point can old or unused data be archived or deleted? How many concurrent users are anticipated?
  • What sorts of data virtualization are needed to support application requirements in a way that does not tightly couple the application to the database schema?
  • Will other applications need the data? If so, what data and how?
  • Will users expect to be able to do ad-hoc querying and reporting of the data? If so, how and with which tools?
  • What, if any, business or application processes does the database need to implement? (e.g., trigger code that does cross-database integrity checking or updating, application classes encapsulated in database procedures or functions, database views that provide table recombination for ease of use or security purposes, etc.).
  • Are there application or developer concerns regarding the database, or the database development process, that need to be addressed?
  • Is the application code efficient? Can a code change relieve a performance issue?

In designing and building the database, the DBA should keep the following design principles firmly in mind (remember the acronym PRISM):

  • Performance and Ease of Use: Ensure quick and easy access to data by approved users in a usable and business-relevant form, maximizing the business value of both applications and data.
  • Reusability: The database structure should ensure that, where appropriate, multiple applications would be able to use the data. The database structure should also ensure that multiple business purposes, (such as business analysis, quality improvement, strategic planning, customer relationship management, and process improvement) could use the data. Avoid coupling a database, data structure, or data object to a single application. Do not tightly couple an application to a database! Data should reflect the real entities and attributes of the business, not the requirements of a single application.
  • Integrity: The data should always have a valid business meaning and value, regardless of context, and should always reflect a valid state of the business. Enforce data integrity as close to the data as possible, and immediately detect and report violations of data integrity constraints.
  • Security: True and accurate data should always be immediately available to authorized users, but only to authorized users. The privacy concerns of all stakeholders, including customers, business partners, and government regulators, must be met. Enforce data security, like data integrity, as close to the data as possible, and immediately detect and report security violations.
  • Maintainability: Perform all data work at a cost that yields value by ensuring that the cost of creating, storing, maintaining, using, and disposing of data does not exceed its value to the organization. Ensure the fastest possible response to changes in business processes and new business requirements.

Here are some recommended best practices for physical database design:

  1. For relational databases supporting transaction processing (OLTP) applications, use a normalized design to promote data integrity, reusability, good update performance, and data extensibility.
  2. At the same time, use views, functions, and stored procedures to create non-normalized, application-specific, object-friendly, conceptual (virtual) views of data. Do not force developers to work at the physical database level, nor tightly couple database schemas to applications. The goal is to abstract the functionality of the data from its physical structure and make it as easy as possible to work with.
  3. Use standard naming conventions and meaningful, descriptive names across all databases and database objects for ease of maintenance, especially if abbreviations are necessary.
  4. Enforce data security and integrity at the database level, not in the application. This enables the easy reuse of data, while saving developers the work of having to write and test code-level constraints in every application that uses a given piece of data.
  5. Try to keep database processing on the database server as much as possible, for maximum performance, ease of maintenance, security, scalability, reduced network traffic, and lower cost of development. For example, implement all database updates and complex SQL queries as stored procedures in the database, instead of embedding them in the application code, and use server-side (rather than client-side) cursors. Using stored procedures makes it easy to isolate and fix errors and performance problems, enhances performance, and greatly reduces network traffic.
  6. Grant permissions on database objects (tables, views, stored procedures, functions, and so on) only to application groups or roles, not to individuals. This improves both security and ease of maintenance.
  7. Do not permit any direct, ad-hoc updating of the database; do all updates in a controlled manner, through pre-defined procedures.

5.2.4.1.2 Performance Modifications

When implementing a physical database, consider how the database will perform when applications make requests to access and modify data. There are several techniques used to optimize database performance.

Indexing can improve query performance in many cases. The database designer must select and define appropriate indexes for database tables. An index is an alternate path for accessing data in the database to optimize query (data retrieval) performance. Major RDBMS products support many types of indexes. Indexes can be unique or non-unique, clustered or non-clustered, partitioned or non-partitioned, single column or multi-column, b-tree or bitmap or hashed. Without an appropriate index, the DBMS will revert to reading every row in the table (table scan) to retrieve any data. On large tables, this is very costly. Try to build indexes on large tables to support the most frequently run queries, using the most frequently referenced columns, particularly keys (primary, alternate, and foreign).

Denormalization is the deliberate transformation of a normalized logical data model into tables with redundant data. In other words, it intentionally puts one data element in multiple places. This process does introduce risk of data errors due to duplication. Implement data quality checks to ensure that the copies of the data elements stay correctly stored. Only denormalized specifically to improve database query performance, by either segregating or combining data to reduce query set sizes, combining data to reduce joins, or performing and storing costly data calculations. Denormalization techniques include (among others):

  • Collapse hierarchies (roll-up): To reduce joins, combine direct-path parent / child relationships into one table, repeating the parent columns in each row. This is a major tool in dimensional modeling (discussed in Chapter 9 on Data Warehousing and Business Intelligence).
  • Divide hierarchies (push down): To reduce query sets, where parent tables are divided into multiple child tables by type. For example, create customer tables that each contain a different type of customer, such as checking, mortgage, investment, etc.
  • Vertically split: To reduce query sets, create subset tables which contain subsets of columns. For example, split a customer table into two based on whether the fields are mostly static or mostly volatile (to improve load / index performance), or based on whether the fields are commonly or uncommonly included in queries (to improve table scan performance).
  • Horizontally split: To reduce query sets, create subset tables using the value of a column as the differentiator. For example, create regional customer tables that contain only customers in a specific region.
  • Combine and pre-join tables: To reduce joins where two tables are joined in a significant number of queries, consider creating a table which already has the result set of a join of both tables.
  • Repeat columns in one row: To reduce row counts or to enable row-to-row comparisons, create a table with repeated rows. For example, rather than 12 rows for 12 months, have 12 columns, one for each month.
  • Derive data from stored data: To reduce calculation expense at query time, especially calculations that require data from multiple tables, pre-calculate columns and store the results in a table, either a new table or one of the participants in the calculation.
  • Create reporting copies: To improve report performance, create a table which contains all the elements needed for reporting, already calculated and joined, and update that periodically.
  • Create duplicates (mirrors): To improve performance where certain data sets are frequently used and are often in contention, create duplicate versions for separate user groups, or for loading vs. querying.

5.2.4.1.3 Physical Database Design Documentation

The physical database design document guides implementation and maintenance. It is reviewable to catch and correct errors in the design before creating or updating the database. It is modifiable for ease of implementation of future iterations of the design. A physical database design document consists of the following components:

  • An introductory description of the business function of the database design; for example, what aspect or subset of the business data does this database design encompass?
  • A graphical model of the design, done in ER format for a relational design, or in UML for an object-oriented design.
  • Database-language specification statements. In Structured Query Language (SQL), these are the Data Definition Language (DDL) specifications for all database objects (tablespaces, tables, indexes, indexspaces, views, sequences, etc., and XML Namespaces).
  • Documentation of the technical meta-data, including data type, length, domain, source, and usage of each column, and the structure of keys and indexes related to each table.
  • Use cases or sample data, showing what the actual data will look like.
  • Short descriptions, as needed, to explain:
    • The database architecture and technology chosen, and why they were chosen.
    • Constraints that affected the selection of the DBMS, including cost constraints, policy constraints, performance constraints, reliability or scalability constraints, security constraints, application constraints, expected data volumes, etc.
    • The database design process, including the methods and tools used.
    • The differences between the physical database design and the logical data model, and the reasons for these differences.
    • The update mechanism chosen for the database, and its implementation.
    • Security requirements for the database, and their implementation.
    • The service-level agreement (SLA) for the database and its implementation.
    • User and / or application requirements for the database and their implementation.

5.2.4.2 Design Information Products

While database design is the primary focus of data development, data professionals should also participate in the design of related data deliverables.

Data analysts may assist software designers and developers in the design of information products, including screens and reports, to meet business data requirements. Data analysts should ensure consistent use of business data terminology, and should ensure that presentation formats add appropriate context to the data for data producers and information consumers.

The DBA will often assist in the development of applications that make data more readily available, in a more usable form, to business users and managers. Many exciting new technologies exist for this purpose, and the DBA should be familiar with them:

  • Reporting services: Reporting services give business users the ability to execute both canned and ad-hoc reports, and have the data made available to them in a number of different ways, such as delivered (published) via email or RSS feed, accessible via web browser or portal, extracted to an Excel spreadsheet, and so on.
  • Analysis services: Analysis services give business users to ability to “slice and dice” data across multiple business dimensions, such as to analyze sales trends for products or product categories across multiple geographic areas and / or dates / times. This also includes “predictive analytics”, which is the analysis of data to identify future trends and potential business opportunities.
  • Dashboards: A dashboard is a type of user interface designed to display a wide array of analytics indicators, such as charts and graphs, efficiently. The user can “drill down” through these indicators to view the data beneath.
  • Scorecards: A scorecard is a specialized type of analytics display that indicates scores or calculated evaluations of performance. Scorecards often have an actual value (the measure), a goal or forecast (the baseline), a score (measure compared to baseline), and an indicator (a visual representation of how favorable or unfavorable the score may be).
  • Portals: Portals are web interfaces that present links to multiple applications and sources of information on a single, well-designed, easily accessible web page. Portals provide a means of bringing together a large number of diverse users, with different information needs, and creating a “community” based on common interests. Portals provide users with the ability to share documents, search through document libraries, hold discussions, and collaborate on projects.
  • XML Delivery: To enable the effective use of XML within databases and applications, it is often necessary to create schema definitions. These definitions validate XML documents, XML transforms (using XSLT to convert XML to HTML, or some other presentation form), and database objects. Database objects needing validation include views, stored procedures, and functions that can search through XML documents, convert XML data to relational form (or vice-versa), and merge relational and XML data.
  • Business Process Automation: Use data integrated from multiple databases as input to software for business process automation that coordinates multiple business processes across disparate platforms.
  • Application Integration: Similarly, data integration (along with its core components, data transformation, and cleansing) is a key component of Enterprise Application Integration (EAI) software, enabling data to be easily passed from application to application across disparate platforms.

The DBA’s involvement with the development of these products may include data analysis, the creation of data structures (such as XML schemas, OLAP cubes, or data marts) and database objects to support these products, enabling access to data, and assisting with data integration and delivery.

DBAs may assist software developers by creating and maintaining database access statements. In SQL, these statements are known as Data Manipulation Language (DML) and include SELECT, INSERT, UPDATE, and DELETE statements. DBAs often review these statements and recommend alternate approaches and performance tuning modifications.

DBAs may collaborate with software designers and developers on designing data access-layer services in a service-oriented architecture (SOA). Data access services standardize data access and insulate programs from database changes.

5.2.4.3 Design Data Access Services

It will oftentimes be necessary (and desirable) to access data in remote databases, and to combine that data with data in the local database. Several mechanisms exist for doing this, and the DBA should be familiar with the strengths and weaknesses of each. Some of the most common methods of accessing and reusing remote data are as follows:

  • “Linked Server” type connections: Some DBMSs permit you to define remote database servers as “linked servers”, and access them over an ODBC or OLE / DB connection. This approach has the advantage of being quick, easy, and inexpensive; however, there are some caveats to keep in mind:
    • Such connections have limited functionality; generally limited to executing a hard-coded query defined as a string literal, or a stored procedure.
    • They can present security concerns. Do not use hard-coded user identifiers and passwords in defining such connections, and restrict permissions on the target server to a read-only subset of only the required data.
    • They do not scale well. Use them only for relatively small amounts of data).
    • They are synchronous, requiring the calling procedure to wait for all the data to be returned.
    • They are dependent on the quality of the vendor-supplied ODBC or OLE / DB drivers (which is sometimes abysmal).
  • However, this method has one major advantage: it is easily implementable in the database, allowing access to remote data from views, triggers, functions, and stored procedures in the database.
  • SOA Web Services: Encapsulate remote data access in the form of web services and call them from applications. Implement these either synchronously or asynchronously, depending on the requirements of the application. This approach greatly increases the reusability of data to applications, and generally performs and scales quite well. However, there are a couple of drawbacks:
    • Web services are harder and more costly to write, test, and deploy.
    • The organization runs the risk of creating an “SOA Nightmare” of numerous point-to-point, application-specific, non-reusable web services, all of which need to be maintained in response to changing database schemas and locations.
    • It is difficult for database objects to consume web services. They must usually be consumed by applications. Some of the newer DBMSs permit you to encapsulate application classes as stored procedures or functions; however, this method will not work for views.
  • Message Brokers: Some DBMSs (e.g., Microsoft SQL Server 2005) allow you to implement messaging services in the database. A stored procedure or function in one database can send a message resulting in the execution of a query, stored procedure, or function in another database, with the results returned asynchronously to the calling procedure. This approach is relatively easy to implement, reliable, scalable, and performs well. However, it only works with instances of the same DBMS.
  • Data Access Classes: Write application classes that use ODBC or OLE / DB connections to access data on remote, disparate servers and make it available to applications. In the .NET environment, this data can be stored internally as an ADO.NET dataset object (a sort of in-memory database) for ease of access and better performance. Similar third party and open-source technology exists for Unix / Linux and Java applications.
  • ETL: In cases where it is not technologically feasible to access data at its source, or where performance considerations make this access untenable, various DBMS and third-party ETL tools can bridge the gap. These tools extract data from the source, transform it as necessary (e.g., reformatting and cleansing it), and either load it into a read-only table in the database, or stream the result set to the calling procedure or application. Execute a DBMS ETL package from a stored procedure or function, and schedule it to execute at periodic intervals. Major drawbacks are that it may not scale or perform well for large numbers of records, and may be difficult and expensive to maintain over time.
  • Replication: Another option for getting data from one database environment to another is replication. Most DBMSs support some type of replication technology (e.g., mirroring and log shipping), although this replication requires that the source and target servers be the same DBMS. For replication across disparate platforms or DBMSs, more “home grown” solutions are possible. For example, a batch process on one platform can extract data to a flat file on disk. The file can be copied (using FTP or some similar mechanism) to the target server, and then loaded via another batch process. The challenge is to get the timing right (i.e., ensure that the data gets to the target server before it is needed), and to make sure that any failures in the replication process are promptly detected and reported. Note that if the replicated data is going to be updated on the target server (try to avoid this if possible!), a secure and reliable mechanism must be put into place to replicate those updates back to the source server, ideally through some sort of two-phase commit process.
  • Co-location: As a last resort, it may be necessary to co-locate the source and target databases (or DBMS instances) on the same database server. Obviously, this is not an ideal solution, since it tightly-couples the two databases. It should be used only in situations where the data is similar in business meaning and use, and where the volumes of data required (or the frequency of access) precludes any other solution.

Remember that the end goal is to enable the easy and inexpensive reuse of data across the enterprise, the avoidance, wherever possible, of costly data replication schemes, and the prevention, wherever possible, of redundant and inconsistent data.

5.2.4.4 Design Data Integration Services

A database transaction is an atomic unit of recoverable work. A transaction can include multiple database instructions. Upon completion of all the steps within the transaction, issue a database COMMIT to make all changes together. Up to that point, the changes can be rolled back. A transaction is atomic, meaning either “all or nothing”. It performs either all the instructions, or none. Application developers define database transactions by determining when to COMMIT changes.

A critical aspect of database design is determining appropriate update mechanisms. Whenever multiple users can concurrently update tables, implement some concurrency control mechanism to ensure that two users cannot update the same record at the same time. This usually involves adding a data element of type “timestamp” or “datetime” to each of these tables, making sure that the value of this field is checked before the record is modified, and updating whenever the record is changed.

Use locks to ensure the integrity of data, permitting only one user to change a database row at any one time. Lock data at different levels, known as lock granularity. DBAs determine the appropriate level of locking for each database object, such as column, row, page, table, file, or database.

Data analysts and data integration specialists define source-to-target mappings and data transformation designs for extract-transform-load (ETL) programs and other technology for on-going data movement, cleansing, and integration. DBAs may collaborate in this design activity.

Data analysts, data integration specialists, and DBAs also design programs and utilities for data migration and conversion from old data structures to new data structures.

Several methods are available, but any method chosen must satisfy the following criteria:

  1. Do all updates in a controlled manner. Do not allow direct, ad-hoc updating of the database.
  2. Manage all updates relating to a particular business process as a single unit of work, and either commit or completely roll back the transaction, known as transactional integrity. Do not allow partial updates of the database to occur.
  3. Do not allow two or more users to update the same record at the same time, without the other’s knowledge, known as concurrency control.
  4. Immediately abort the current transaction and roll back errors in updating, and immediately report the error to the calling process or application.
  5. Restrict the ability to update a particular database table to a set of users (contained in one or more user roles) authorized to do so.
  6. Restrict updates to a small number of records at a time, to prevent excessive locking of tables and “hanging” of an application when rolling back a large update.

Consider the following possible update mechanisms:

  • Fundamental stored procedures (FSPs): Each FSP implements one operation (Insert, Update, Delete, or Select) on a limited number of records, usually designated by one or more key values, for a single database table. Automatically generate FSPs, if used, either from the physical model or from the database schema. This greatly reduces the time required to implement a database, and makes it easier to change the schema in response to new requirements.
  • Application data layer: Write an application component that calls stored procedures in the database to perform updates across multiple tables, or that calls multiple FSPs. Stored procedures are recommended because they perform better since the SQL code is precompiled and pre-optimized. They are more secure since only designated users or roles can execute them, and the tables are not opened up to SQL injection attacks. They are easier to maintain and errors or performance problems can be easily detected and corrected.
  • Dataset updating: Update records in an application dataset or data table through a DataAdapter object, which can, in turn, be associated with a set of stored procedures that perform Insert, Update, Delete, and Select operations.
  • Updateable views: In some relational DBMSs, views can be associated with a set of “Instead Of” triggers that can handle updates of the underlying tables in a controlled manner. As with FSPs, it is preferable to generate the code in an automated fashion to reduce or eliminate time spent in coding, testing, and maintenance.

5.2.5 Data Model and Design Quality Management

Data analysts and designers act as an intermediary between information consumers (the people with business requirements for data) and the data producers who capture the data in usable form. Data professionals must juggle the business data requirements of the information consumers, including executives, and the application requirements of data producers. Systems requirements document application data requirements in the form of use cases, an application class model, and service level agreements (SLAs).

Data professionals must also balance the short-term versus long-term business interests. Information consumers need data in a timely fashion to meet short-term business obligations and to take advantage of current business opportunities. System-development project teams must meet time and budget constraints. However, they must also meet the long-term interests of all stakeholders by ensuring that an organization’s data resides in data structures that are secure, recoverable, sharable, and reusable, and that this data is as correct, timely, relevant, and usable as possible. Therefore, data models and database designs should be a reasonable balance between the short-term needs and the long-term needs of the enterprise.

5.2.5.1 Develop Data Modeling and Design Standards

Data modeling and database design standards serve as the guiding principles to effectively meet business data needs, conform to data architecture, and ensure data quality. Data architects, data analysts, and database administrators must jointly develop these standards. They must complement and not conflict with related IT standards.

Publish data model and database naming standards for each type of modeling object and database object. Naming standards are particularly important for entities, tables, attributes, keys, views, and indexes. Names should be unique and as descriptive as possible.

Logical names should be meaningful to business users, using full words as much as possible and avoiding all but the most familiar abbreviations. Physical names must conform to the maximum length allowed by the DBMS and use abbreviations where necessary. While logical names use blank spaces as separators between words, physical names typically use underscores as word separators.

Naming standards should minimize name changes across environments. Names should not reflect their specific environment, such as test, QA, or production. Class words can be useful to distinguish attributes from entities, and column names from table names. They can also show which attributes and columns are quantitative rather than qualitative, which can be important when analyzing the contents of those columns.

Data modeling and database design standards should include:

  • A list and description of standard data modeling and database design deliverables.
  • A list of standard names, acceptable abbreviations, and abbreviation rules for uncommon words, that apply to all data model objects.
  • A list of standard naming formats for all data model objects, including attribute and column class words.
  • A list and description of standard methods for creating and maintaining these deliverables.
  • A list and description of data modeling and database design roles and responsibilities.
  • A list and description of all meta-data properties captured in data modeling and database design, including both business meta-data and technical meta-data, with guidelines defining meta-data quality expectations and requirements.
  • Guidelines for how to use data modeling tools.
  • Guidelines for preparing for and leading design reviews.

5.2.5.2 Review Data Model and Database Design Quality

Project teams should conduct requirements reviews and design reviews as appropriate. These reviews should include a conceptual data model review, a logical data model review, and a physical database design review.

Conduct design reviews with a group of subject matter experts representing different backgrounds, skills, expectations, and opinions. Participants must be able to discuss different viewpoints and reach group consensus without personal conflict, as all participants share the common goal of promoting the most practical, best performing and most usable design. Chair each design review with one leader who facilitates the meeting. The leader creates and follows an agenda, ensures all required documentation is available and distributed, solicits input from all participants, maintains order and keeps the meeting moving, and summarizes the group’s consensus findings. Many design reviews also utilize a scribe to capture points of discussion.

5.2.5.2.1 Conceptual and Logical Data Model Reviews

Conceptual data model and logical data model design reviews should ensure that:

  1. Business data requirements are completely captured and clearly expressed in the model, including the business rules governing entity relationships.
  2. Business (logical) names and business definitions for entities and attributes (business semantics) are clear, practical, consistent, and complementary. The same term must be used in both names and descriptions.
  3. Data modeling standards, including naming standards, have been followed.
  4. The conceptual and logical data models have been validated.

5.2.5.2.2 Physical Database Design Review

Physical database design reviews should ensure that:

  1. The design meets business, technology, usage, and performance requirements.
  2. Database design standards, including naming and abbreviation standards, have been followed.
  3. Availability, recovery, archiving, and purging procedures are defined according to standards.
  4. Meta-data quality expectations and requirements are met in order to properly update any meta-data repository.
  5. The physical data model has been validated.

All concerned stakeholders, including the DBA group, the data analyst / architect, the business data owners and / or stewards, the application developers, and the project managers, should review and approve the physical database design document. The complete design document should be ready as part of the production turnover of the database.

5.2.5.2.3 Data Model Validation

Validate data models against modeling standards, business requirements, and database requirements. Here are some sample validation questions:

  • Does the model match applicable modeling standards? Does the model use standard data dictionary terms? Does the model use standard domains? Does the model use class word suffixes on all applicable columns? Does the model include descriptions of all objects and relationships? Does the model use abbreviation standards where applicable?
  • Does the model match the business requirements? Does the model contain all the relevant data items? Can you execute the required transactions against the database? Can you retrieve the transaction contents correctly? Can you execute any required queries against the model?
  • Does the model match the database requirements? Are there no objects named the same as database-reserved words? Do all objects have unique names? Does the model assign owners to all objects?

5.2.5.3 Manage Data Model Versioning and Integration

Data models and other design specifications require careful change control, just like requirements specifications and other SDLC deliverables. Note each change to a data model to preserve the lineage of changes over time. If a change involves the logical model, such as a new or changed business data requirement, the data analyst or architect must review and approve the change.

Each change should note:

  • Why the project or situation required the change.
  • What and How the object(s) changed, including which tables had columns added, modified, or removed, etc.
  • When the change was approved and when the change was made to the model. This is not necessarily when the change was implemented in a system.
  • Who made the change.
  • Where the change was made; in which models.

Changes may be made to multiple parts of Enterprise models simultaneously, as part of the normal process. It is important to integrate any changes to a model part back into the enterprise model, especially the enterprise logical model, to prevent errors in data and databases during future development.

Some data modeling tools include repositories that provide data model versioning and integration functionality. Otherwise, preserve data models in DDL exports or XML files, checking them in and out of a standard source code management (SCM) system just like application code.

5.2.6 Data Implementation

Data implementation consists of data management activities that support system building, testing, and deployment, including:

  • Database implementation and change management in the development and test environments.
  • Test data creation, including any security procedures, such as obfuscation.
  • Development of data migration and conversion programs, both for project development through the SDLC and for business situations like consolidations or divestitures.
  • Validation of data quality requirements.
  • Creation and delivery of user training.
  • Contribution to the development of effective documentation.

After design, the DBA is responsible for implementing the designed data structures in the development and test environments. These structures include database tables or files, views, stored procedures, and functions, OLAP data cubes, XSLT schemas, and other similar objects. The DBA is responsible for change control of the development database environment and its configuration. Change control procedures for development and test environments should be similar or the same as those used to control production environments. The DBA should manage configuration changes to database design specification (DDL) files using the same change and configuration management tools and practices used for other information system deliverables.

5.2.6.1 Implement Development / Test Database Changes

As changes to the database are required during the course of application development, the DBA either implements or oversees them. These changes usually come from the developer. Implementation happens depending on roles and responsibilities:

  • Developers may have the ability to create and update database objects directly , such as views, functions, and stored procedures, and then update the DBAs and data modelers for review and update of the data model.
  • The development team may have their own “developer DBA” who is given permission to make schema changes, with the proviso that these changes be reviewed with the DBA and data modeler.
  • Developers may work with the data modelers, who make the change to the model in the data modeling tool, and then generate ‘change DDL” for the DBAs to review and implement.
  • Developers may work with the data modelers, who interactively ‘push’ changes to the development environment, using functionality in the data-modeling tool, after review and approval by the DBAs.

If an iterative development method is being used (for example, Agile Development), then some of the work of reviewing and approving changes, and updating the logical and physical models, may need to be done asynchronously. Consider giving approvals verbally so that development can proceed without undue interruption, and do the update of the models as a follow-on task. However, take care to ensure that the database does not get “out-of-synch” with the logical model, and that the database does not become “stove-piped” by being tightly coupled to a single application. Implement application-specific database requirements as much as possible, using views, stored procedures, functions, and other forms of data virtualization.

DBAs should carefully monitor all database code to ensure that it is written to the same standards as application code. All database code should be well documented, testable (ideally, containing built-in diagnostic code that can be triggered via a passed parameter), understandable, consistent with the agreed-upon standards, and easily maintainable. The DBA should also identify, as early as possible, poor SQL coding practices that could lead to errors or performance problems, and bring them to the attention of the developers before multiple stored procedures or functions replicate poor SQL code. A little extra attention at the beginning of a project can save everyone a great deal of grief later on.

5.2.6.2 Create and Maintain Test Data

The DBA and software developers and testers may collaborate to populate databases in the development environment with test data. Either generate test data, or extract a representative subset of production data. Strictly observe privacy and confidentiality requirements and practices for test data. Delete obsolete, unusable, and no longer needed test data.

The DBA may also assist the developers with the creation of SQL scripts and data integration “packages”, such as DTS or SSIS packages, used to create and maintain test data. Usually, this work is the primary responsibility of the development team, but oftentimes they need and appreciate the expertise of the DBA. This is another way that DBAs can add value to the development effort.

5.2.6.3 Migrate and Convert Data

A key component of many projects is the migration of legacy data to a new database environment, including any necessary data cleansing and reformatting. This is a significant effort. The time and cost required should not be (but probably will be) under-estimated. It will require the collaborative effort of the data architect / analyst(s) familiar with the legacy data model(s) and the target data model, the DBA, business users, and developers familiar with the legacy application(s). Depending on where the legacy data is stored, this effort may involve the use of many different technologies, including SQL, COBOL, Unix scripting, DBMS integration packages such as DTS or SSIS, non-relational DBMSs, third-party ETL applications, data integration web services, FTP, RPC, ODBC, OLE / DB, and so on. Data migration efforts can easily consume thousands of hours of effort.

5.2.6.4 Build and Test Information Products

Data professionals, including the DBA, should collaborate with software developers on development and testing of information products created by the system, including:

  • Implementing mechanisms for integrating data from multiple sources, along with the appropriate meta-data to ensure meaningful integration of the data.
  • Implementing mechanisms for reporting and analyzing the data, including online and web-based reporting, ad-hoc querying, BI scorecards, OLAP, portals, and the like.
  • Implementing mechanisms for replication of the data, if network latency or other concerns make it impractical to service all users from a single data source.

Software developers are responsible for coding and testing programs, including database access calls. Software developers are also responsible for creating, testing, and maintaining information products, including screens and reports. Testing includes unit, integration, and performance testing.

5.2.6.5 Build and Test Data Access Services

DBAs are responsible for developing data access services. The DBA collaborates with software developers in developing, testing, and executing data access services, first for development and test environments, and later for production deployment.

Data requirements should include business rules for data access to guide the implementation of data access services, collaborating with software developers. Business data stewards and other subject matter experts (SMEs) should validate the correct implementation of data access requirements and performance through user acceptance testing.

5.2.6.6 Build and Test Data Integration Services

Data integration specialists are responsible for developing ETL programs and technology for data integration, as well as data migration and conversion from old data structures into new structures. The DBA collaborates with software developers in developing, testing, and executing data migration and conversion programs and procedures, first for development and test data, and later for production deployment.

Data requirements should include business rules for data quality to guide the implementation of application edits and database referential integrity constraints. Business data stewards and other subject matter experts (SMEs) should validate the correct implementation of data requirements through user acceptance testing.

5.2.6.7 Validate Information Requirements

The responsibilities of data professionals within the SDLC do not end with design. They continue to interact as part of project teams for system development through the implementation of these designs. Database administrators are particularly active in these SDLC stages. Business data stewards may also remain involved after analysis and design, or a separate independent quality assurance team may control the test process. The primary work will be in testing and validating that the solution meets the requirements, but also in planning deployment, developing training, and documentation.

In any application development project, especially those using iterative (“Agile”) methods, data (and database) requirements may change abruptly, in response to either new or changed business requirements, invalidated assumptions regarding the data, or re-prioritization of existing requirements. The data modeler may serve as the intermediary between the developers and the data analyst / architect, reviewing any additions or changes to business data requirements. The data modeler would also properly reflect them in the logical and physical data models. The DBA would implement any changes in the most effective manner in the database. The DBA then works with the developers to test the implementation of the data requirements, and make sure that the application requirements are satisfied.

5.2.6.8 Prepare for Data Deployment

While database administrators resolve technical implementation and testing issues, data analysts can leverage the business knowledge captured in data modeling to define clear and consistent language in user training and documentation. Business concepts, terminology, definitions, and rules depicted in data models are an important part of application user training, even if data models themselves are not useful as teaching illustrations. The data stewards that contribute business knowledge to the definition of the data models, and who are accountable for system data quality, are often also the process and application owners responsible for user acceptance of both the system and related training and documentation. Use their nomenclature consistently.

Data stewards and data analysts should participate in deployment preparation, including development and review of training materials and system documentation, especially to ensure consistent use of defined business data terminology. Help desk support staff also requires orientation and training in how system users appropriately access, manipulate, and interpret data.

The DBA is primarily responsible for implementing new and changed database objects into the production environment (see Chapter 6 on Data Operations Management). Database administrators should carefully control the installation of new databases and changes to existing databases in the production environment. Once installed, business data stewards and data analysts should monitor the early use of the system to see that business data requirements are indeed met.

5.3 Summary

The guiding principles for implementing data development into an organization, a summary table of the roles for each data development activity, and organization and cultural issues that may arise during data development are summarized below.

5.3.1 Guiding Principles

The implementation of the data development function into an organization follows nine guiding principles:

  1. Data development activities are an integral part of the software development lifecycle (SDLC).
  2. Data modeling is an essential technique for effective data management and system design.
  3. Conceptual and logical data modeling express business and application requirements, while physical data modeling represents solution design. Data modeling and database design define detail solution component specifications.
  4. Data modeling and database design balances tradeoffs and needs.
  5. Data professionals should collaborate with other project team members to design information products and data access and integration interfaces.
  6. Data modeling and database design should follow documented standards.
  7. Design reviews should review all data models and designs, in order to ensure they meet business requirements and follow design standards.
  8. Data models represent valuable knowledge resources (meta-data). Carefully manage and control them through library, configuration, and change management to ensure data model quality and availability.
  9. Database administrators (DBAs) and other data professionals play important roles in the construction, testing, and deployment of databases and related application systems.

5.3.2 Data Development Process Summary

The process summary for the data development function is shown in Table 5.1. The deliverables, responsible roles, approving roles, and contributing roles are shown for each activity in the data development function. The Table is also shown in Appendix A9.

Activities

Deliverables

Responsible Roles

Approving Roles

Contributing Roles

3.1.1 Analyze Information Requirements (D)

Information Requirement Specification Statements

Data Architects, Data Analysts

Data Stewards

Data Stewards, Other SMEs

3.1.2 Develop and Maintain Conceptual Data Models (D)

Conceptual Data Model Diagrams and Reports

Data Architects, Data Analysts

Data Stewards, Data Architects

Data Stewards, Other SMEs

3.1.3 Develop and Maintain Logical Data Models (D)

Logical Data Model Diagrams and Reports

Data Architects, Data Analysts, Data Modelers

Data Stewards, Data Architects

Data Stewards, Other SMEs

3.1.4 Develop and Maintain Physical Data Models (D)

Physical Data Model Diagrams and Reports

Data Architects, Data Modelers, DBAs

DBAs, Data Architects

Software Developers

3.2.1 Design Physical Databases (D)

DDL Specifications, OLAP Cube Specs, XML schemas

DBAs, Application Architects, Software Developers

Data Architects, DBAs, Application Architects

Data Analysts, Data Modelers, Software Developers

3.2.2 Design Information Products (D)

Application Screens, Reports

Software Developers

Application Architects

Data Analysts, DBAs

3.2.3 Design Data Access Services (D)

Data Access Service Design Specifications

Software Developers, DBAs

Application Architects, Data Architects

Data Analysts, DBAs

3.2.4 Design Data Integration Services (D)

Source-to-Target Maps, ETL Design Specs, Conversion Designs

Data Integration Specialists, DBAs, Data Analysts

DBAs, Data Architects, Application Architects

Data Analysts, Data Stewards, DBAs

3.3.1 Develop Data Modeling and Database Design Standards (P)

Data Modeling Standards Documents, Database Design Standards Documents

Data Architects, Data Analysts, Data Modelers, DBAs

DM Executive, Data Governance Council

Data Stewards, Application Architects, Software Developers

3.3.2 Review Data Model and Database Design Quality (C)

Design Review Findings

Data Architects, Data Analysts, Data Modelers, DBAs

DM Executive, Project Manager

Application Architects, Software Developers

3.3.3 Manage Data Model Versioning and Integration (C)

Model Management Libraries and Contents

Data Model Administrators, Data Modelers

Data Architects, DM Executive

Data Analysts, DBAs

3.4.1 Implement Development and Test Database Changes (D)

Dev and Test DB Environments, Database Tables, Other DB Objects

DBAs

DM Executive

Data Architects, Data Analysts, Software Developers

3.4.2 Create and Maintain Test Data (D)

Test Databases, Test Data

DBAs, Data Analysts, Software Developers, Test Analysts

Data Architects, Application Architects, Data Stewards

Data Stewards, Software Developers, Data Analysts

3.4.3 Migrate and Convert Data (D)

Migrated and Converted Data

DBAs, Software Developers

Data Stewards, Data Architects

Data Analysts

3.4.4 Build and Test Information Products (D)

Information Products: Screens, Reports

Software Developers

Data Stewards, Application Architects, Data Architects

DBAs, Data Analysts

3.4.5 Build and Test Data Access Services (D)

Data Access Services

(interfaces)

Software Developers

Data Architects, Application Architects

DBAs

3.4.6 Build and Test Data Integration Services (D)

Data Integration Services (ETL, etc.)

Data Integration Specialists

Data Stewards, Data Architects

DBAs, Data Analysts

3.4.7 Validate Information Requirements (D)

Validated Requirements, User Acceptance Signoff

Data Stewards, Testing Specialists

Data Stewards

Data Analysts, Data Architects, DBAs

3.4.8 Prepare for Data Deployment (D)

User Training, User Documentation

Data Stewards, Business SMEs, Training Specialists, Data Analysts

Data Stewards, Data Architects

Data Stewards, Data Architects, DBAs

Table 5.1 Data Development Process Summary

5.3.3 Organizational and Cultural Issues

Q1: What is the biggest issue with data delivery?

A1: The biggest organizational and cultural issue regarding data delivery is simply recognizing the need for it and taking advantage of what data development offers. Many organizations focus on application development, overlooking the importance of the data itself. Simply discovering the importance and usefulness of data analysis and data modeling can be transformational to an organization. Both the Business and IT start considering the impact to data when considering system changes, sometimes realizing that they already have similar data and functionality in another application, or that they don’t really need what they thought they had or wanted.

Q2: How does one begin formal data development?

A2: In order to start the transformation, it is necessary to start documenting systems from a data point of view. Data flows, data models, and data quality analyses all factor into this documentation. Start with one system, and move to systems that either give or receive data directly from the first system. Network diagrams from infrastructure can help with this.

Next, distribute pictures of the systems’ data flows and data models to the stakeholders of that system, both business and IT. Sit down with them to verify that what the pictures show is what they understand the system to do, or what they see the system does. Make sure that all stakeholders believe that the documentation shows the current reality of the system.

Then, publicize the existence of these new documents. Create a master version of the documents and implement changes to them as part of the SDLC. When a project goes into production, part of the production release is to distribute the updated data flows and data models.

Once the word gets out, data analysts and data modelers will be very busy both documenting additional systems and helping software engineers to use these new documents during project work. Additional headcount for that team will probably become necessary.

It will be an iterative process to get access to all systems in order to analyze them. Be persistent. The money saved from reduced system redundancy, reduced redundancy of data storage, and more efficient development can save the organization millions of dollars.

The last step is to change the culture of the organization, moving toward automatically referring to these documents during requirements and design of projects as standard operating procedure. Once data development is part of the culture, the organization devoted to maintaining it will grow to fit the organization’s need.

5.4 Recommended Reading

The references listed below provide additional reading that support the material presented in Chapter 5. These recommended readings are also included in the Bibliography at the end of the Guide.

5.4.1 Data Modeling and Database Design

Ambler, Scott. Agile Database Techniques: Effective Strategies for the Agile Software Developer. Wiley & Sons, 2003. ISBN 0-471-20283-5.

Ambler, Scott W. and Pramodkumar J. Sadalage. Refactoring Databases: Evolutionary Database Design. Addison-Wesley, 2006. ISBN 0-321-29353-3.

Avison, David and Christine Cuthbertson. A Management Approach to Database Applications. McGraw Hill, 2002. ISBN 0-077-09782-3.

Brackett, Michael H. Practical Data Design. Prentice Hall, 1990. ISBN 0-136-90827-6.

Bruce, Thomas A. Designing Quality Databases with IDEF1X Information Models. Dorset House, 1991. ISBN 10:0932633188. 584 pages.

Carlis, John and Joseph Maguire. Mastering Data Modeling - A User-Driven Approach. Addison Wesley, 2000. ISBN 0-201-70045-X.

Date, C. J. An Introduction to Database Systems, 8th Edition. Addison-Wesley, 2003. ISBN 0-321-19784-4.

Date, C. J. and Hugh Darwen. Databases, Types and the Relational Model: The Third Manifesto, 3rd Edition. Addison Wesley, 2006. ISBN 0-321-39942-0.

DeAngelis, Carla. Data Modeling with Erwin. Indiana: Sams Publishing, 2000. ISBN 0-672-31868-7.

Dorsey, Paul. Enterprise Data Modeling Using UML. McGraw-Hill Osborne Media, 2007. ISBN 0-072-26374-1.

Fleming, Candace C. and Barbara Von Halle. The Handbook of Relational Database Design. Addison Wesley, 1989. ISBN 0-201-11434-8.

Halpin, Terry. Information Modeling and Relational Databases: From Conceptual Analysis to Logical Design. Morgan Kaufmann, 2001. ISBN 1-558-60672-6.

Halpin, Terry, Ken Evans, Pat Hallock, and Bill McLean. Database Modeling with Microsoft Visio for Enterprise Architects. Morgan Kaufmann, 2003. ISBN 1-558-60919-9.

Harrington, Jan L. Relational Database Design Clearly Explained, 2nd Edition. Morgan Kaufmann, 2002. ISBN 1-558-60820-6.

Hay, David C. Data Model Patterns: A Metadata Map. Morgan Kaufmann, 2006. ISBN 0-120-88798-3.

Hay, David C. Data Model Patterns: Conventions of Thought. Dorset House Publishing, 1996. ISBN 0-932633-29-3.

Hay, David C. Requirements Analysis From Business Views to Architecture. Prentice Hall, 2003. ISBN 0-120-28228-6.

Hernandez, Michael J. Database Design for Mere Mortals: A Hands-On Guide to Relational Database Design, 2nd Edition. Addison-Wesley, 2003. ISBN 0-201-75284-0.

Hoberman, Steve. The Data Modeler’s Workbench. Tools and Techniques for Analysis and Design. John Wiley & Sons, 2001. ISBN 0-471-11175-9.

Hoberman, Steve. Data Modeling Made Simple: A Practical Guide for Business & Information Technology Professionals. Technics Publications, LLC, 2005. ISBN 0-977-14000-8.

Hoffer, Jeffrey A., Joey F.. George, and Joseph S. Valacich. Modern Systems Analysis and Design, 4th Edition. Prentice Hall, 2004. ISBN 0-131-45461-7.

Krogstie, John, Terry Halpin, and Keng Siau, editors. Information Modeling Methods and Methodologies: Advanced Topics in Database Research. Idea Group Publishing, 2005. ISBN 1-591-40375-8.

Muller, Robert. J. Database Design for Smarties: Using UML for Data Modeling. San Francisco, CA, USA, Morgan Kaufmann, 1999. ISBN 1-558-60515-0.

Newton, Judith J. and Daniel Wahl,, editors. Manual For Data Administration. Washington, DC: GPO, NIST Special Publications 500-208, 1993.

Pascal, Fabian. Practical Issues In Database Management: A Reference For The Thinking Practitioner. Addison-Wesley, 2000. ISBN 0-201-48555-9.

Reingruber, Michael. C. and William W. Gregory. The Data Modeling Handbook: A Best-Practice Approach to Building Quality Data Models. John Wiley & Sons, 1994. ISBN 0-471-05290-6.

Riordan, Rebecca M. Designing Effective Database Systems. Addison-Wesley, 2005. ISBN 0-321-20903-3.

Rob, Peter and Carlos Coronel. Database Systems: Design, Implementation, and Management, 7th Edition. Course Technology, 2006. ISBN 1-418-83593-5.

Schmidt, Bob. Data Modeling for Information Professionals. Prentice Hall, 1999. ISBN 0-13-080450-9.

Silverston, Len. The Data Model Resource Book, Volume 1: A Library of Universal Data Models for All Enterprises, 2nd Edition, John Wiley & Sons, 2001. ISBN 0-471-38023-7.

Silverston, Len. The Data Model Resource Book, Volume 2: A Library of Data Models for Specific Industries, 2nd Edition. John Wiley & Sons, 2001. ISBN 0-471-35348-5.

Simsion, Graeme C. and Graham C. Witt. Data Modeling Essentials, 3rd Edition. Morgan Kaufmann, 2005. ISBN 0-126-44551-6.

Teorey, Toby , Sam Lightstone, and Tom Nadeau. Database Modeling and Design, 4th Edition. Morgan Kaufmann, 2006. ISBN 1-558-60500-2.

Thalheim, Bernhard. Entity-Relationship Modeling: Foundations of Database Technology. Springer, 2000. ISBN 3-540-65470-4.

Van der Lans, Rick F. Introduction to SQL: Mastering the Relational Database Language, 4th Edition. Addison-Wesley, 2006. ISBN 0-321-30596-5.

Watson, Richard T. Data Management: Databases And Organization, 5th Edition. John Wiley & Sons, 2005. ISBN 0-471-71536-0.

5.4.2 Business Rules

Chisholm, Malcolm. How to Build a Business Rules Engine: Extending Application Functionality Through Metadata Engineering. Morgan Kaufmann, 2003. ISBN 1-558-60918-0.

Date, C. J., What Not How: The Business Rules Approach To Application Development. Addison-Wesley, 2000. ISBN 0-201-70850-7.

Morgan, Tony. Business Rules and Information Systems: Aligning IT with Business Goals. Addison-Wesley, 2002. ISBN 0-201-74391-4.

Ross, Ronald G. Business Rules Concepts, 2nd Edition. Business Rule Solutions, 2005. ISBN 0-941-04906-X.

Ross, Ronald G. Principles of the Business Rule Approach. Addison-Wesley, 2003. ISBN 0-201-78893-4.

Von Halle, Barbara. Business Rules Applied: Building Better Systems Using the Business Rules Approach. John Wiley & Sons, 2001. ISBN 0-471-41293-7.

5.4.3 Information Engineering

Finkelstein, Clive. An Introduction to Information Engineering: From Strategic Planning to Information Systems. Addison-Wesley, 1990. ISBN 0-201-41654-9.

Finkelstein, Clive. Information Engineering: Strategic Systems Development. Addison-Wesley, 1993. ASIN B000XUA41C.

Inmon, W. H. Advanced Topics in Information Engineering. John Wiley & Sons - QED, 1989. ISBN 0-894-35269-5.

Inmon, W. H. Information Engineering For The Practitioner. Prentice-Hall (Yourdon Press), 1988. ISBN 0-13-464579-0.

Martin, James. Information Engineering Book 1: Introduction. Prentice-Hall, 1989. ISBN 0-13-464462-X. Also see Book 2: Analysis and Design and Book 3: Design and Construction.

5.4.4 Agile Development

Ambler, Scott. Agile Database Techniques: Effective Strategies for the Agile Software Developer. Wiley & Sons, 2003. ISBN 0-471-20283-5.

5.4.5 Object Orientation and Object-Oriented Design

Wirfs-Brock, Rebecca, Brian Wilkerson, and Lauren Wiener. Designing Object-Oriented Software. NJ: Prentice Hall, 1990. ISBN 0-13-629825-7.

Coad, Peter. Object Models: Strategies, Patterns And Applications, 2nd Edition. Prentice Hall PTR, 1996. ISBN 0-13-840117-9.

Entsminger, Gary. The Tao Of Objects. M & T Books, 1990. ISBN 1-55851-155-5.

Goldberg, Adele and Kenneth S, Rubin. Succeeding With Objects. Addison-Wesley, 1995. ISBN 0-201-62878-3.

Graham, Ian, Migrating To Object Technology. Addison-Wesley, 1995. ISBN 0-201-59389-0.

Jacobson, Ivar, Maria Ericsson, and Agneta Jacobson. The Object Advantage. Addison-Wesley, 1995. ISBN 0-201-42289-1.

Taylor, David. Business Engineering With Object Technology. New York: John Wiley, 1995. ISBN 0-471-04521-7

Taylor, David. Object Oriented Technology: A Manager’s Guide. Reading, MA: Addison-Wesley, 1990. ISBN 0-201-56358-4

5.4.6 Service-oriented architecture (SOA)

Barry, Douglas K. Web Services and Service-Oriented Architectures: The Savvy Manager’s Guide. Morgan Kaufmann, 2003. ISBN 1-55860-906-7.

Erl, Thomas. Service-Oriented Architecture: A Field Guide to Integrating XML and Web Services. Prentice Hall, 2004. ISBN 0-131-42898-5.

Erl, Thomas. Service-Oriented Architecture: Concepts, Technology and Design. Prentice Hall, 2004. ISBN 0-131-85858-0.

5.4.7 SQL

Celko, Joe. Joe Celko’s SQL for Smarties: Advanced SQL Programming, 3rd Edition. ISBN 10: 0123693799. 840 pages.

Celko, Joe. Joe Celko’s Trees and Hierarchies in SQL for Smarties. Morgan Kaufmann, 2004. ISBN 1-558-60920-2.

Date, C. J., with Hugh Darwen. A Guide to the SQL Standard, 4th Edition. Addison-Wesley, 1997. ISBN 0-201-96426-0.

Kline, Kevin, with Daniel Kline. SQL in a Nutshell. O’Reilly, 2001. ISBN 0-471-16518-2.

Van der Lans, Rick F. Introduction to SQL: Mastering the Relational Database Language, 4th Edition. Addison-Wesley, 2006. ISBN 0-321-30596-5.

5.4.8 Software Process Improvement

Humphrey, Watts S. Managing The Software Process. Addison Wesley, 1989. ISBN 0-201-18095-2.

5.4.9 XML

Aiken, Peter and M. David Allen. XML in Data Management: Understanding and Applying Them Together. Morgan Kaufmann, 2004. ISBN 0-12-45599-4.

Bean, James. XML for Data Architects: Designing for Reuse and Integration. Morgan Kaufmann, 2003. ISBN 1-558-60907-5.

Finkelstein, Clive and Peter Aiken. Building Corporate Portals with XML. McGraw-Hill, 1999. ISBN 10: 0079137059. 512 pages.

Melton, Jim and Stephen Buxton. Querying XML: XQuery, XPath and SQL/XML in Context. Morgan Kaufmann, 2006. ISBN 1-558-60711-0.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset