Data Lifecycle management activities focus on planning and designing for data, enabling data use and maintenance, and actually using data to meet organizational goals (see Figure 1). Data architects and data modelers plan and design for data.
This chapter will describe:
Enterprise architecture
Architecture refers to an organized arrangement of component elements intended to optimize the function, performance, feasibility, cost, and aesthetics of an overall structure or system. The term architecture has been adopted to describe several facets of information systems design. Even in small organizations, information technology is complicated. Architectural artifacts and documentation which depict systems and data flows show people how systems, processes, and data work together. A strategic approach to architecture allows an organization to make better decisions about its systems and data.
Architecture practice is carried out at different levels within an organization (including enterprise, domain, or project) and with different areas of focus (e.g., infrastructure, application, or data). Table 1 describes and compares architectural domains. Architects from different domains must address development requirements collaboratively, as each domain influences the other domains.
A well-managed enterprise architecture practice can help an organization understand the current state of its systems, promote desirable change toward future state, enable regulatory compliance, and improve effectiveness. Effective management of data and the systems in which data is stored and used is a common goal of the breadth of architecture disciplines.
The Zachman Framework
An architecture framework is a foundational structure used to develop a broad range of related architectures. It provides a way of thinking about and understanding architecture and represents an overall ‘architecture for architecture.’ Exactly what architects do can be confusing to people who are not architects and who do not recognize the distinctions implied by these levels and focus areas. Architectural frameworks are valuable because they enable non-architects to understand the relationships (if not the detailed differences) between these concepts.
The most well-known enterprise architectural framework, the Zachman Framework, was developed by John A. Zachman in the 1980s. It has continued to evolve. Zachman recognized that in creating buildings, airplanes, enterprises, value chains, projects, or systems, there are many stakeholders, and each has a different perspective about architecture. He applied this concept to the requirements for different types and levels of architecture within an enterprise.
The Zachman Framework is represented by a 6x6 matrix that summarizes the complete set of models required to describe an enterprise and the relationships between them. It does not define how to create the models. It simply shows what models should exist (see Figure 13).
The Zachman framework summarizes the answers to a simple set of questions (i.e., what, how, where, who, when, why) that might be asked by stakeholders with different perspectives:
The framework then identifies what kinds of architecture artifacts are required to answer these fundamental questions.
Data architecture
Data architecture is fundamental to data management. Because most organizations have more data than individual people can comprehend, it is necessary to represent organizational data at different levels of abstraction so that management can understand it and make decisions about it.
The specialized discipline of data architecture can be understood from several perspectives:
An organization’s data architecture is described by an integrated collection of master design documents at different levels of abstraction, including standards that govern how data is collected, stored, arranged, used, and removed. It is also classified by descriptions of all the containers and paths that data takes through an organization’s systems.
Data architecture artifacts include specifications used to describe existing state, define data requirements, guide data integration, and control data assets as put forth in a data strategy. The most detailed data architecture design document is a formal enterprise data model, containing data names, comprehensive data and Metadata definitions, conceptual and logical entities and relationships, and business rules. Physical data models are included, but as a product of data modeling and design, rather than data architecture.
Data architecture is most valuable when it fully supports the needs of the entire enterprise. Enterprise data architecture defines standard terms and designs for the elements that are important to the entire organization. The design of an enterprise data architecture includes depiction of the business data as such, including the collection, storage, integration, movement, and distribution of data. Enterprise data architecture enables consistent data standardization and integration across the enterprise.
Data architecture should serve as a bridge between business strategy and technology execution. As part of enterprise architecture, data architects:
These business drivers should influence measures of the value of data architecture.
Data architecture artifacts
As data flows within an organization through feeds or interfaces, it is secured, integrated, stored, recorded, catalogued, shared, reported on, analyzed, and delivered to stakeholders. Along the way, the data may be verified, enhanced, linked, certified, aggregated, anonymized, and used for analytics until archived or purged. The enterprise data architecture descriptions must therefore include enterprise data models (e.g., data structures and data specifications), as well as data flow designs.
Data architects create and maintain organizational knowledge about data and the systems through which it moves. This knowledge enables an organization to manage its data as an asset and increase the value it gets from its data by identifying opportunities for data usage, cost reduction, and risk mitigation.
Architects seek to design in a way that brings value to the organization. This value comes through an optimal technical footprint, operational and project efficiencies, and the increased ability of the organization to use its data. To get there requires good design, planning, and the ability to ensure that the designs and plans are executed effectively.
Enterprise Data Model (EDM)
The EDM is a holistic, enterprise-level, implementation-independent conceptual or logical data model providing a common, consistent view of data across the enterprise. An EDM includes key enterprise data entities (i.e., business concepts), their relationships, critical guiding business rules, and some critical attributes. It sets forth the foundation for all data and data-related projects. Any project-level data model must be based on the EDM. The EDM should be reviewed by stakeholders, who should agree that it effectively represents the enterprise.
An organization that recognizes the need for an enterprise data model must decide how much time and effort it can devote to building it. EDMs can be built at different levels of detail, so resource availability will influence initial scope. Over time, as the needs of the enterprise demand, the scope and level of detail captured within an enterprise data model typically expands. Most successful enterprise data models are built incrementally and iteratively, using layers.
Figure 14 relates different types of models, and shows how conceptual models are ultimately linkable to physical application data models. It distinguishes:
All levels are part of the Enterprise Data Model, and linkages create paths to trace an entity from top to bottom and between models in the same level.
Data flow design
Data flow design defines the requirements and master blueprint for storage and processing across databases, applications, platforms, and networks (the components). These data flows map the movement of data to business processes, locations, business roles, and to technical components.
Data flows are a type of data lineage documentation that depicts how data moves through business processes and systems. End-to-end data flows illustrate where the data originated, where it is stored and used, and how it is transformed as it moves inside and between diverse processes and systems. Data lineage analysis can help explain the state of data at a given point in the data flow.
Data flows map and document relationships between data and
Data flows can be documented at different levels of detail: subject area, business entity, or even the attribute level. Systems can be represented by network segments, platforms, common application sets, or individual servers. Data flows can be represented by two-dimensional matrices (Figure 15) or in data flow diagrams (Figure 16).
The Enterprise Data Model and the Data Flow Design need to fit well together. As mentioned, both need to be reflected in current state and target state (architecture perspective), and also, the in transition state (project perspective).
Data architecture and data management quality and innovation
Data and enterprise architecture deal with complexity from two viewpoints:
These two drivers require separate approaches.
Working within enterprise architecture or as a data architecture team, data architects are responsible for developing a roadmap, managing enterprise data requirements within projects, and integrating with the overall enterprise architecture. Success depends on defining and adhering to standards and creating and maintaining useful and usable architectural artifacts. A disciplined architecture practice can improve efficiency and quality by creating reusable and extensible solutions.
Data Modeling
A model is a representation of something that exists or a pattern for something to be made. Maps, organization charts, and building blueprints are examples of models in use every day. Model diagrams make use of standard symbols that allow one to understand content.
Data modeling is the process of discovering, analyzing, and scoping data requirements, and then representing and communicating these data requirements in a precise form called the data model. Data modeling is a critical component of data management. The modeling process requires that organizations discover and document how their data fits together.25 Data models enable an organization to understand its data assets.
Data models comprise and contain Metadata essential to data consumers. Much of this Metadata uncovered during the data modeling process is essential to other data management functions. For example, definitions for data governance and lineage for data warehousing and analytics.
A data model describes an organization’s data as the organization understands it or as the organization wants it to be. A data model contains a set of symbols with text labels that attempts visually to represent data requirements as communicated to the data modeler, for a specific set of data that can range in size from small (for a project) to large (for an organization).
The model is thus a form of documentation for data requirements and data definitions resulting from the modeling process. Data models are the main medium used to communicate data requirements from business to IT, and within IT from analysts, modelers, and architects, to database designers and developers.
Data Models are critical to effective management of data because they:
Data modeling goals
The goal of data modeling is to confirm and document understanding of different perspectives on data. This understanding leads to applications and data that more closely align with current and future business requirements. This understanding also creates a foundation to successfully complete broad-scoped initiatives such as Master Data Management and data governance programs. Proper data modeling leads to lower support costs and increases the reusability opportunities for future initiatives, thereby reducing the costs of building new applications. In addition, data models themselves are an important form of Metadata.
Confirming and documenting understanding of different perspectives facilitates:
Data models help us understand an organization or business area, an existing application, or the impact of modifying an existing data structure. The data model becomes a reusable map to help business professionals, project managers, analysts, modelers, and developers understand data structure within the environment. In much the same way as the mapmaker learned and documented a geographic landscape for others to use for navigation, the modeler enables others to understand an information landscape.26
Building blocks of data models
There are many different kinds of data models, including relational, dimensional, etc. Modelers will use appropriate types of models based on the organization’s needs, the data being modeled, and the system that the model is being developed for. Each type of model uses different visual conventions to capture information.
Models also differ based on the level of abstraction of the information they depict (conceptual with a high level of abstraction; logical with a medium level of abstraction; and physical which depicts a specific system or instantiation of data). But models all use the same building blocks: entities, relationships, attributes, and domains.
As a leader in your organization, it is not necessary that you be able to read data models. However, it is helpful if you understand how they describe data. The definitions and examples here give you a flavor of how data models work.
Entity
Outside of data modeling, the definition of entity is a thing that exists separate from other things. Within data modeling, an entity is a thing about which an organization collects information. Entities are sometimes referred to as ‘the nouns of an organization.’ In relational data models, entities are the boxes that identify the concept being modeled.
An entity can be thought of as the answer to a fundamental question – who, what, when, where, why, or how – or to a combination of these questions. Table 2 defines and gives examples of commonly used entity categories.27
Relationship
A relationship is an association between entities.28 A relationship captures the high-level interactions between conceptual entities, the detailed interactions between logical entities, and the constraints between physical entities. Relationships are shown as lines on the data modeling diagram.
In a relationship between two entities, cardinality captures how many of one entity (entity instances) participates in the relationship with how many of the other entity. For example, a Company can have one or many Employees.