Chapter 6
Planning and Design in Data Lifecycle Management

Data Lifecycle management activities focus on planning and designing for data, enabling data use and maintenance, and actually using data to meet organizational goals (see Figure 1). Data architects and data modelers plan and design for data.

This chapter will describe:

  • The role of enterprise architecture in planning and designing for the organization
  • The critical function of data architecture within data management
  • The goals and artifacts associated with data modeling

Enterprise architecture

Architecture refers to an organized arrangement of component elements intended to optimize the function, performance, feasibility, cost, and aesthetics of an overall structure or system. The term architecture has been adopted to describe several facets of information systems design. Even in small organizations, information technology is complicated. Architectural artifacts and documentation which depict systems and data flows show people how systems, processes, and data work together. A strategic approach to architecture allows an organization to make better decisions about its systems and data.

Architecture practice is carried out at different levels within an organization (including enterprise, domain, or project) and with different areas of focus (e.g., infrastructure, application, or data). Table 1 describes and compares architectural domains. Architects from different domains must address development requirements collaboratively, as each domain influences the other domains.

A well-managed enterprise architecture practice can help an organization understand the current state of its systems, promote desirable change toward future state, enable regulatory compliance, and improve effectiveness. Effective management of data and the systems in which data is stored and used is a common goal of the breadth of architecture disciplines.

The Zachman Framework

An architecture framework is a foundational structure used to develop a broad range of related architectures. It provides a way of thinking about and understanding architecture and represents an overall ‘architecture for architecture.’ Exactly what architects do can be confusing to people who are not architects and who do not recognize the distinctions implied by these levels and focus areas. Architectural frameworks are valuable because they enable non-architects to understand the relationships (if not the detailed differences) between these concepts.

The most well-known enterprise architectural framework, the Zachman Framework, was developed by John A. Zachman in the 1980s. It has continued to evolve. Zachman recognized that in creating buildings, airplanes, enterprises, value chains, projects, or systems, there are many stakeholders, and each has a different perspective about architecture. He applied this concept to the requirements for different types and levels of architecture within an enterprise.

The Zachman Framework is represented by a 6x6 matrix that summarizes the complete set of models required to describe an enterprise and the relationships between them. It does not define how to create the models. It simply shows what models should exist (see Figure 13).

The Zachman framework summarizes the answers to a simple set of questions (i.e., what, how, where, who, when, why) that might be asked by stakeholders with different perspectives:

  • The executive perspective (business context): Lists of business elements defining scope in identification models.
  • The business management perspective (business concepts): Clarification of the relationships between business concepts defined by Executive Leaders as Owners in definition models.
  • The architect perspective (business logic): System logical models detailing system requirements and unconstrained design represented by Architects as Designers in representation models.
  • The engineer perspective (business physics): Physical models optimizing the design for implementation for specific use under the constraints of specific technology, people, costs, and timeframes specified by Engineers as Builders in specification models.
  • The technician perspective (component assemblies): A technology-specific, out-of-context view of how components are assembled and operate configured by Technicians as Implementers in configuration models.
  • The user perspective (operations classes): Actual functioning instances used by Workers as Participants. There are no models in this perspective.

The framework then identifies what kinds of architecture artifacts are required to answer these fundamental questions.

Data architecture

Data architecture is fundamental to data management. Because most organizations have more data than individual people can comprehend, it is necessary to represent organizational data at different levels of abstraction so that management can understand it and make decisions about it.

The specialized discipline of data architecture can be understood from several perspectives:

  • Data architecture outcomes, such as models, definitions and data flows on various levels (usually referred to as data architecture artifacts)
  • Data architecture activities, to form, deploy and fulfill data architecture intentions
  • Data architecture behavior, such as collaborations, mindsets, and skills among the various roles that affect the enterprise’s data architecture

An organization’s data architecture is described by an integrated collection of master design documents at different levels of abstraction, including standards that govern how data is collected, stored, arranged, used, and removed. It is also classified by descriptions of all the containers and paths that data takes through an organization’s systems.

Data architecture artifacts include specifications used to describe existing state, define data requirements, guide data integration, and control data assets as put forth in a data strategy. The most detailed data architecture design document is a formal enterprise data model, containing data names, comprehensive data and Metadata definitions, conceptual and logical entities and relationships, and business rules. Physical data models are included, but as a product of data modeling and design, rather than data architecture.

Data architecture is most valuable when it fully supports the needs of the entire enterprise. Enterprise data architecture defines standard terms and designs for the elements that are important to the entire organization. The design of an enterprise data architecture includes depiction of the business data as such, including the collection, storage, integration, movement, and distribution of data. Enterprise data architecture enables consistent data standardization and integration across the enterprise.

Data architecture should serve as a bridge between business strategy and technology execution. As part of enterprise architecture, data architects:

  • Strategically prepare organizations to quickly evolve their products, services, and data to take advantage of business opportunities inherent in emerging technologies
  • Translate business needs into data and system requirements so that processes consistently have the data they require
  • Manage complex data and information delivery throughout the enterprise
  • Facilitate alignment between Business and IT
  • Act as agents for change, transformation, and agility

These business drivers should influence measures of the value of data architecture.

Data architecture artifacts

As data flows within an organization through feeds or interfaces, it is secured, integrated, stored, recorded, catalogued, shared, reported on, analyzed, and delivered to stakeholders. Along the way, the data may be verified, enhanced, linked, certified, aggregated, anonymized, and used for analytics until archived or purged. The enterprise data architecture descriptions must therefore include enterprise data models (e.g., data structures and data specifications), as well as data flow designs.

Data architects create and maintain organizational knowledge about data and the systems through which it moves. This knowledge enables an organization to manage its data as an asset and increase the value it gets from its data by identifying opportunities for data usage, cost reduction, and risk mitigation.

Architects seek to design in a way that brings value to the organization. This value comes through an optimal technical footprint, operational and project efficiencies, and the increased ability of the organization to use its data. To get there requires good design, planning, and the ability to ensure that the designs and plans are executed effectively.

Enterprise Data Model (EDM)

The EDM is a holistic, enterprise-level, implementation-independent conceptual or logical data model providing a common, consistent view of data across the enterprise. An EDM includes key enterprise data entities (i.e., business concepts), their relationships, critical guiding business rules, and some critical attributes. It sets forth the foundation for all data and data-related projects. Any project-level data model must be based on the EDM. The EDM should be reviewed by stakeholders, who should agree that it effectively represents the enterprise.

An organization that recognizes the need for an enterprise data model must decide how much time and effort it can devote to building it. EDMs can be built at different levels of detail, so resource availability will influence initial scope. Over time, as the needs of the enterprise demand, the scope and level of detail captured within an enterprise data model typically expands. Most successful enterprise data models are built incrementally and iteratively, using layers.

Figure 14 relates different types of models, and shows how conceptual models are ultimately linkable to physical application data models. It distinguishes:

  • A conceptual overview over the enterprise’s subject areas
  • Views of entities and relationships for each subject area
  • Detailed, partially attributed logical views of these same subject areas
  • Logical and physical models specific to an application or project

All levels are part of the Enterprise Data Model, and linkages create paths to trace an entity from top to bottom and between models in the same level.

Data flow design

Data flow design defines the requirements and master blueprint for storage and processing across databases, applications, platforms, and networks (the components). These data flows map the movement of data to business processes, locations, business roles, and to technical components.

Data flows are a type of data lineage documentation that depicts how data moves through business processes and systems. End-to-end data flows illustrate where the data originated, where it is stored and used, and how it is transformed as it moves inside and between diverse processes and systems. Data lineage analysis can help explain the state of data at a given point in the data flow.

Data flows map and document relationships between data and

  • Applications within a business process
  • Data stores or databases in an environment
  • Network segments (useful for security mapping)
  • Business roles, depicting which roles have responsibility for creating, updating, using, and deleting data (CRUD)
  • Locations where local differences occur

Data flows can be documented at different levels of detail: subject area, business entity, or even the attribute level. Systems can be represented by network segments, platforms, common application sets, or individual servers. Data flows can be represented by two-dimensional matrices (Figure 15) or in data flow diagrams (Figure 16).

The Enterprise Data Model and the Data Flow Design need to fit well together. As mentioned, both need to be reflected in current state and target state (architecture perspective), and also, the in transition state (project perspective).

Data architecture and data management quality and innovation

Data and enterprise architecture deal with complexity from two viewpoints:

  • Quality-oriented: Focus on improving execution within business and IT development cycles. Unless architecture is managed, architecture will deteriorate. Systems will gradually become more complex and inflexible, creating risk for an organization. Uncontrolled data delivery, data copies, and interface ‘spaghetti’ relationships make organizations less efficient and reduce trust in the data.
  • Innovation-oriented: Focus on transforming business and IT to address new expectations and opportunities. Driving innovation with disruptive technologies and data uses has become a role of the modern enterprise architect.

These two drivers require separate approaches.

  • The quality-oriented approach aligns with traditional data architecture work where architectural quality improvements are accomplished incrementally through the architect’s connection with projects. Typically, the architect keeps the entirety of architecture in mind and focuses on long-term goals directly connected to governance, standardization, and structured development.
  • The innovation-oriented approach can have a shorter-term perspective and be using unproven business logic and leading edge technologies. This orientation often requires architects make contact with people within the organization with whom IT professionals do not usually interact; for example, product development representatives and business designers.

Working within enterprise architecture or as a data architecture team, data architects are responsible for developing a roadmap, managing enterprise data requirements within projects, and integrating with the overall enterprise architecture. Success depends on defining and adhering to standards and creating and maintaining useful and usable architectural artifacts. A disciplined architecture practice can improve efficiency and quality by creating reusable and extensible solutions.

Data Modeling

A model is a representation of something that exists or a pattern for something to be made. Maps, organization charts, and building blueprints are examples of models in use every day. Model diagrams make use of standard symbols that allow one to understand content.

Data modeling is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model. Data modeling is a critical component of data management. The modeling process requires that organizations discover and document how their data fits together.25 Data models enable an organization to understand its data assets.

Data models comprise and contain Metadata essential to data consumers. Much of this Metadata uncovered during the data modeling process is essential to other data management functions. For example, definitions for data governance and lineage for data warehousing and analytics.

A data model describes an organization’s data as the organization understands it or as the organization wants it to be. A data model contains a set of symbols with text labels that attempts visually to represent data requirements as communicated to the data modeler, for a specific set of data that can range in size from small (for a project) to large (for an organization).

The model is thus a form of documentation for data requirements and data definitions resulting from the modeling process. Data models are the main medium used to communicate data requirements from business to IT, and within IT from analysts, modelers, and architects, to database designers and developers.

Data Models are critical to effective management of data because they:

  • Provide a common vocabulary around data
  • Capture and document explicit knowledge (Metadata) about an organization’s data and systems
  • Serve as a primary communications tool during projects
  • Provide the starting point for customization, integration, or even replacement of an application

Data modeling goals

The goal of data modeling is to confirm and document understanding of different perspectives on data. This understanding leads to applications and data that more closely align with current and future business requirements. This understanding also creates a foundation to successfully complete broad-scoped initiatives such as Master Data Management and data governance programs. Proper data modeling leads to lower support costs and increases the reusability opportunities for future initiatives, thereby reducing the costs of building new applications. In addition, data models themselves are an important form of Metadata.

Confirming and documenting understanding of different perspectives facilitates:

  • Formalization: A data model documents a concise definition of data structures and relationships. It enables assessment of how data is affected by implemented business rules, for current (as-is) states or desired target states. Formal definition imposes a disciplined structure to data that reduces the possibility of data anomalies occurring when accessing and persisting data. By illustrating the structures and relationships in the data, a data model makes data easier to consume.
  • Scope definition: A data model can help explain the boundaries for data context and implementation of purchased application packages, projects, initiatives, or existing systems.
  • Knowledge retention/documentation: A data model can preserve corporate memory regarding a system or project by capturing knowledge in an explicit form. It serves as documentation for future projects to use as the as-is version.

Data models help us understand an organization or business area, an existing application, or the impact of modifying an existing data structure. The data model becomes a reusable map to help business professionals, project managers, analysts, modelers, and developers understand data structure within the environment. In much the same way as the mapmaker learned and documented a geographic landscape for others to use for navigation, the modeler enables others to understand an information landscape.26

Building blocks of data models

There are many different kinds of data models, including relational, dimensional, etc. Modelers will use appropriate types of models based on the organization’s needs, the data being modeled, and the system that the model is being developed for. Each type of model uses different visual conventions to capture information.

Models also differ based on the level of abstraction of the information they depict (conceptual with a high level of abstraction; logical with a medium level of abstraction; and physical which depicts a specific system or instantiation of data). But models all use the same building blocks: entities, relationships, attributes, and domains.

As a leader in your organization, it is not necessary that you be able to read data models. However, it is helpful if you understand how they describe data. The definitions and examples here give you a flavor of how data models work.

Entity

Outside of data modeling, the definition of entity is a thing that exists separate from other things. Within data modeling, an entity is a thing about which an organization collects information. Entities are sometimes referred to as ‘the nouns of an organization.’ In relational data models, entities are the boxes that identify the concept being modeled.

An entity can be thought of as the answer to a fundamental question – who, what, when, where, why, or how – or to a combination of these questions. Table 2 defines and gives examples of commonly used entity categories.27

Relationship

A relationship is an association between entities.28 A relationship captures the high-level interactions between conceptual entities, the detailed interactions between logical entities, and the constraints between physical entities. Relationships are shown as lines on the data modeling diagram.

In a relationship between two entities, cardinality captures how many of one entity (entity instances) participates in the relationship with how many of the other entity. For example, a Company can have one or many Employees.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset