Throughout this book, we have referred to the use and management of Metadata. One of the principles of data management is that Metadata is integral to managing data. In other words, you need data to manage data. Metadata describes what data you have. And if you don’t know what data you have, you cannot manage it. Metadata management is a foundational activity that needs to be carried out throughout the data lifecycle. The lifecycle of Metadata also needs to be managed.
The most common definition of Metadata, “data about data,” is misleadingly simple. For some it is, unfortunately, a source of confusion rather than clarification, because many kinds of information can be classified as Metadata, and there is not a clear line between “data” and “Metadata”. Instead of trying to draw that line, we will describe how Metadata is used and why it is so important.
To understand Metadata’s vital role in data management, imagine a large library, with hundreds of thousands of books and magazines, but no card catalog. Without a card catalog, readers might not even know how to start looking for a specific book or even a specific topic. The card catalog not only provides the necessary information (which books and materials the library owns and where they are shelved) it also enables patrons to find materials using different starting points (subject area, author, or title). Without the catalog, finding a specific book would be difficult if not impossible. An organization without Metadata is like a library without a card catalog.
Like other data, Metadata requires management. As the capacity of organizations to collect and store data increases, the role of Metadata in data management grows in importance. But Metadata management is not an end in itself; it is a means by which an organization can get more value from its data. To be data-driven, an organization must be Metadata-driven.
Metadata and its benefits
In data management, Metadata includes information about technical and business processes, data rules and constraints, and logical and physical data structures. It describes the data itself (e.g., databases, data elements, data models), the concepts the data represents (e.g., business processes, application systems, software code, technology infrastructure), and the connections (relationships) between the data and concepts. Metadata helps an organization understand its data, its systems, and its workflows. It enables data quality assessment and is integral to the management of databases and other applications. It contributes to the ability to process, maintain, integrate, secure, audit, and govern other data.
Data cannot be managed without Metadata. In addition, Metadata itself must be managed. Reliable, well-managed Metadata helps:
Organizations get more value out of their data assets if their data is of high quality. Quality data depends on governance. Because it explains the data and processes which enable organizations to function, Metadata is critical to data governance. If Metadata is a guide to the data in an organization, then it must be well-managed. Poorly managed Metadata leads to:
Well-executed Metadata management enables a consistent understanding of data resources and more efficient cross-organizational development.
Types of metadata
Metadata is generally categorized into three types: business, technical, or operational.
Business Metadata focuses largely on the content and condition of the data and also includes details related to data governance. Business Metadata includes the non-technical names and definitions of concepts, subject areas, entities, and attributes; attribute data types and other attribute properties; range descriptions; calculations; algorithms and business rules; valid domain values and their definitions. Examples of Business Metadata include:
Technical Metadata provides information about the technical details of data, the systems that store data, and the processes that move it within and between systems. Examples of Technical Metadata include:
Operational Metadata describes details of the processing and accessing of data. For example:
These categories help people understand the range of information that falls under the umbrella of Metadata, as well as the functions that produce Metadata. However, the categories can also lead to confusion. People may be caught up in questions about which category a set of Metadata belongs to, or who is supposed to use it. It is best to think of these categories in relation to where Metadata originates, rather than how it is used. In relation to usage, the distinctions between Metadata types are not strict. Technical and operational staff use ‘business’ Metadata and vice versa.
Metadata is data
While Metadata can be understood through its uses and the categories, it is important to remember that Metadata is data. Like other data, it has a lifecycle (see Figure 26). We must manage it in relation to its lifecycle.
An organization should plan for the Metadata it needs, design processes so that high-quality Metadata can be created and maintained, and augment its Metadata as it learns from its data.
Metadata and data management
Metadata is essential to data management as well as data usage. All large organizations produce and use a lot of data. Across an organization, different individuals will have different levels of data knowledge, but no individual will know everything about the data. This information must be documented or the organization risks losing valuable knowledge about itself. Metadata provides the primary means of capturing and managing organizational knowledge about data.
But Metadata management is not only a knowledge management challenge, it is also a risk management necessity. Metadata is necessary to ensure an organization can identify private or sensitive data and that it can manage the data lifecycle for its own benefit and in order to meet compliance requirements and minimize risk exposure.
Without reliable Metadata, an organization does not know what data it has, what the data represents, where it originates, how it moves through systems, who has access to it, or what it means for the data to be of high quality. Without Metadata, an organization cannot manage its data as an asset. Indeed, without Metadata, an organization may not be able to manage its data at all.
Metadata and interoperability
As technology has evolved, the speed at which data is generated has also increased. Technical Metadata has become absolutely integral to the way in which data is moved and integrated. ISO’s Metadata Registry Standard, ISO/IEC 11179, is intended to enable Metadata-driven exchange of data in a heterogeneous environment, based on exact definitions of data. Metadata present in XML and other formats enables use of the data. Other types of Metadata tagging allow data to be exchanged while retaining signifiers of ownership, security requirements, etc.
Metadata strategy
As noted, the types of information that can be used as Metadata are wide-ranging. Metadata is created in various places throughout an enterprise. The challenges come with bringing Metadata together so that people and processes can use it.
A Metadata strategy describes how an organization intends to manage its Metadata and how it will move from current state to future state practices. A Metadata strategy should provide a framework for development teams to improve Metadata management. Developing Metadata requirements will help clarify the drivers of the strategy and identify potential obstacles to enacting it.
The strategy includes defining the organization’s future state enterprise Metadata content and architecture and the implementation phases required to meet strategic objectives. Steps include:
The strategy will evolve over time, as Metadata requirements, the architecture, and the lifecycle of Metadata are better understood.
Understand metadata requirements
Metadata requirements start with content: what Metadata is needed and at what level. For example, physical and logical names need to be captured for both columns and tables. Metadata content is wide-ranging and requirements will come from both business and technical data consumers.
There are also many functionality-focused requirements associated with a comprehensive Metadata solution:
Metadata architecture
Like other forms of data, Metadata has a lifecycle. While there are different ways to architect a Metadata solution, conceptually, all Metadata management solutions include architectural layers that correspond to points in the Metadata lifecycle
A Metadata Management system must be capable of bringing together Metadata from many different sources. Systems will differ depending on the degree of integration and the role of the integrating system in the maintenance of the Metadata.
A managed Metadata environment should isolate the end user from the various and disparate Metadata sources. The architecture should provide a single access point for required Metadata. Design of the architecture depends on the specific requirements of the organization. Three technical architectural approaches to building a common Metadata repository mirror the approaches to designing data warehouses:
Implement a managed Metadata environment incrementally to minimize risks and facilitate acceptance. The repository contents should be generic in design. It should not merely reflect the source system database designs. Enterprise subject area experts should help create a comprehensive Metadata model for content. Planning should account for integrating Metadata so that data consumers can see across different data sources. The ability to do so will be one of the most valuable capabilities of the repository. It should house current, planned, and historical versions of the Metadata. Often, the first implementation is a pilot to prove concepts and learn about managing the Metadata environment.
Metadata quality
When managing the quality of Metadata, it is important to recognize that a lot of Metadata originates through existing processes. For example, the data modeling process produces table and column definitions and other Metadata essential to creating data models. To get high-quality Metadata, Metadata should be seen as a product of these processes, rather than as a byproduct of them.
Again, Metadata follows the data lifecycle (see Figure 26). Reliable Metadata starts with a plan and increases in value as it is used, maintained, and enhanced. Metadata sources, like the data model, source to target mapping documentation, ETL logs, and the like should be treated as data sources. They should put in place processes and controls to ensure they produce a reliable, usable data product.
All processes, systems, and data have a need for some level of meta-information; that is, some description of their component pieces and how they work. It is best to plan how to create or collect this information. In addition, as the process, system or data is used, this meta-information grows and changes. It needs to be maintained and enhanced. Use of Metadata often results in recognition of requirements for additional Metadata. For example, sales people using customer data from two different systems may need to know where the data originated in order to better understand their customers.
Several general principles of Metadata management describe the means to manage Metadata for quality:
Like other data, Metadata can be profiled and inspected for quality. Its maintenance should be scheduled or completed as an auditable part of project work.
Metadata governance
Moving from an unmanaged to a managed Metadata environment takes work and discipline. It is not easy to do, even if most people recognize the value of reliable Metadata. Organizational readiness is a major concern, as are methods for governance and control. A comprehensive Metadata approach requires that business and technology staff be able to work closely together in a cross-functional manner.
Metadata Management is a low priority in many organizations. An essential set of Metadata needs coordination and commitment in an organization. From a data management perspective, essential business Metadata includes data definitions, models, and architecture. Essential technical Metadata includes file and data set technical descriptions, job names, processing schedules, etc.
Organizations should determine their specific requirements for the management of the lifecycle of critical Metadata and establish governance processes to enable those requirements. It is recommended that formal roles and responsibilities be assigned to dedicated resources, especially in large or business critical areas. Metadata governance requires Metadata and controls, so the team charged with managing Metadata can test principles on the Metadata they create and use.
What you need to know