Chapter 10
Metadata Management

Throughout this book, we have referred to the use and management of Metadata. One of the principles of data management is that Metadata is integral to managing data. In other words, you need data to manage data. Metadata describes what data you have. And if you don’t know what data you have, you cannot manage it. Metadata management is a foundational activity that needs to be carried out throughout the data lifecycle. The lifecycle of Metadata also needs to be managed.

The most common definition of Metadata, “data about data,” is misleadingly simple. For some it is, unfortunately, a source of confusion rather than clarification, because many kinds of information can be classified as Metadata, and there is not a clear line between “data” and “Metadata”. Instead of trying to draw that line, we will describe how Metadata is used and why it is so important.

To understand Metadata’s vital role in data management, imagine a large library, with hundreds of thousands of books and magazines, but no card catalog. Without a card catalog, readers might not even know how to start looking for a specific book or even a specific topic. The card catalog not only provides the necessary information (which books and materials the library owns and where they are shelved) it also enables patrons to find materials using different starting points (subject area, author, or title). Without the catalog, finding a specific book would be difficult if not impossible. An organization without Metadata is like a library without a card catalog.

Like other data, Metadata requires management. As the capacity of organizations to collect and store data increases, the role of Metadata in data management grows in importance. But Metadata management is not an end in itself; it is a means by which an organization can get more value from its data. To be data-driven, an organization must be Metadata-driven.

Metadata and its benefits

In data management, Metadata includes information about technical and business processes, data rules and constraints, and logical and physical data structures. It describes the data itself (e.g., databases, data elements, data models), the concepts the data represents (e.g., business processes, application systems, software code, technology infrastructure), and the connections (relationships) between the data and concepts. Metadata helps an organization understand its data, its systems, and its workflows. It enables data quality assessment and is integral to the management of databases and other applications. It contributes to the ability to process, maintain, integrate, secure, audit, and govern other data.

Data cannot be managed without Metadata. In addition, Metadata itself must be managed. Reliable, well-managed Metadata helps:

  • Increase confidence in data by providing context, enabling consistent representation of the same concepts, and enabling the measurement of data quality
  • Increase the value of strategic information (e.g., Master Data) by enabling multiple uses
  • Improve operational efficiency by identifying redundant data and processes
  • Prevent the use of out-of-date or incorrect data
  • Protect sensitive information
  • Reduce data-oriented research time
  • Improve communication between data consumers and IT professionals
  • Create accurate impact analysis thus reducing the risk of project failure
  • Improve time-to-market by reducing system development lifecycle time
  • Reduce training costs and lower the impact of staff turnover through thorough documentation of data context, history, and origin
  • Support regulatory compliance

Organizations get more value out of their data assets if their data is of high quality. Quality data depends on governance. Because it explains the data and processes which enable organizations to function, Metadata is critical to data governance. If Metadata is a guide to the data in an organization, then it must be well-managed. Poorly managed Metadata leads to:

  • Redundant data and data management processes
  • Replicated and redundant dictionaries, repositories, and other Metadata storage
  • Inconsistent definitions of data elements and risks associated with data misuse
  • Competing and conflicting sources and versions of Metadata which reduce the confidence of data consumers
  • Doubt about the reliability of Metadata and data

Well-executed Metadata management enables a consistent understanding of data resources and more efficient cross-organizational development.

Types of metadata

Metadata is generally categorized into three types: business, technical, or operational.

Business Metadata focuses largely on the content and condition of the data and also includes details related to data governance. Business Metadata includes the non-technical names and definitions of concepts, subject areas, entities, and attributes; attribute data types and other attribute properties; range descriptions; calculations; algorithms and business rules; valid domain values and their definitions. Examples of Business Metadata include:

  • Data models, definitions and descriptions of data sets, tables, and columns
  • Business rules, data quality rules, and transformation rules, calculations, and derivations
  • Data provenance and data lineage
  • Data standards and constraints
  • Security/privacy level of data
  • Known issues with data
  • Data usage notes

Technical Metadata provides information about the technical details of data, the systems that store data, and the processes that move it within and between systems. Examples of Technical Metadata include:

  • Physical database table and column names and properties
  • Data access rights, groups, roles
  • Data CRUD (create, replace, update and delete) rules
  • ETL job details
  • Data lineage documentation, including upstream and downstream change impact information
  • Content update cycle job schedules and dependencies

Operational Metadata describes details of the processing and accessing of data. For example:

  • Logs of job execution for batch programs
  • Results of audit, balance, control measurements and error logs
  • Reports and query access patterns, frequency, and execution time
  • Patches and version maintenance plan and execution, current patching level
  • Backup, retention, date created, disaster recovery provisions

These categories help people understand the range of information that falls under the umbrella of Metadata, as well as the functions that produce Metadata. However, the categories can also lead to confusion. People may be caught up in questions about which category a set of Metadata belongs to, or who is supposed to use it. It is best to think of these categories in relation to where Metadata originates, rather than how it is used. In relation to usage, the distinctions between Metadata types are not strict. Technical and operational staff use ‘business’ Metadata and vice versa.

Metadata is data

While Metadata can be understood through its uses and the categories, it is important to remember that Metadata is data. Like other data, it has a lifecycle (see Figure 26). We must manage it in relation to its lifecycle.

An organization should plan for the Metadata it needs, design processes so that high-quality Metadata can be created and maintained, and augment its Metadata as it learns from its data.

Metadata and data management

Metadata is essential to data management as well as data usage. All large organizations produce and use a lot of data. Across an organization, different individuals will have different levels of data knowledge, but no individual will know everything about the data. This information must be documented or the organization risks losing valuable knowledge about itself. Metadata provides the primary means of capturing and managing organizational knowledge about data.

But Metadata management is not only a knowledge management challenge, it is also a risk management necessity. Metadata is necessary to ensure an organization can identify private or sensitive data and that it can manage the data lifecycle for its own benefit and in order to meet compliance requirements and minimize risk exposure.

Without reliable Metadata, an organization does not know what data it has, what the data represents, where it originates, how it moves through systems, who has access to it, or what it means for the data to be of high quality. Without Metadata, an organization cannot manage its data as an asset. Indeed, without Metadata, an organization may not be able to manage its data at all.

Metadata and interoperability

As technology has evolved, the speed at which data is generated has also increased. Technical Metadata has become absolutely integral to the way in which data is moved and integrated. ISO’s Metadata Registry Standard, ISO/IEC 11179, is intended to enable Metadata-driven exchange of data in a heterogeneous environment, based on exact definitions of data. Metadata present in XML and other formats enables use of the data. Other types of Metadata tagging allow data to be exchanged while retaining signifiers of ownership, security requirements, etc.

Metadata strategy

As noted, the types of information that can be used as Metadata are wide-ranging. Metadata is created in various places throughout an enterprise. The challenges come with bringing Metadata together so that people and processes can use it.

A Metadata strategy describes how an organization intends to manage its Metadata and how it will move from current state to future state practices. A Metadata strategy should provide a framework for development teams to improve Metadata management. Developing Metadata requirements will help clarify the drivers of the strategy and identify potential obstacles to enacting it.

The strategy includes defining the organization’s future state enterprise Metadata content and architecture and the implementation phases required to meet strategic objectives. Steps include:

  • Initiate Metadata strategy planning: The goal of initiation and planning is to enable the Metadata strategy team to define its short- and long-term goals. Planning includes drafting a charter, scope, and objectives aligned with overall governance efforts and establishing a communications plan to support the effort. Key stakeholders should be involved in planning.
  • Conduct key stakeholder interviews: Interviews with business and technical stakeholder provide a foundation of knowledge for the Metadata strategy.
  • Assess existing Metadata sources and information architecture: Assessment determines the relative degree of difficulty in solving the Metadata and systems issues identified in the interviews and documentation review. During this stage, conduct detailed interviews of key IT staff and review documentation of the system architectures, data models, etc.
  • Develop future Metadata architecture: Refine and confirm the future vision, and develop the long-term target architecture for the managed Metadata environment in this stage. This phase must account for strategic components, such as organization structure, alignment with data governance and stewardship, managed Metadata architecture, Metadata delivery architecture, technical architecture, and security architecture.
  • Develop a phased implementation plan: Validate, integrate, and prioritize findings from the interviews and data analyses. Document the Metadata strategy and define a phased implementation approach to move from the existing to the future managed Metadata environment.

The strategy will evolve over time, as Metadata requirements, the architecture, and the lifecycle of Metadata are better understood.

Understand metadata requirements

Metadata requirements start with content: what Metadata is needed and at what level. For example, physical and logical names need to be captured for both columns and tables. Metadata content is wide-ranging and requirements will come from both business and technical data consumers.

There are also many functionality-focused requirements associated with a comprehensive Metadata solution:

  • How frequently Metadata attributes and sets will be updated
  • Timing of updates in relation to source changes
  • Whether historical versions of Metadata need to be retained
  • Who can access Metadata
  • How users access (specific user interface functionality for access)
  • How Metadata will be modeled for storage
  • The degree of integration of Metadata from different sources; rules for integration
  • Processes and rules for updating Metadata (logging and referring for approval)
  • Roles and responsibilities for managing Metadata
  • Metadata quality requirements
  • Security for Metadata – some Metadata cannot be exposed because it will reveal the existence of highly protected data)

Metadata architecture

Like other forms of data, Metadata has a lifecycle. While there are different ways to architect a Metadata solution, conceptually, all Metadata management solutions include architectural layers that correspond to points in the Metadata lifecycle

  • Metadata creation and sourcing
  • Metadata storage in one or more repositories
  • Metadata integration
  • Metadata delivery
  • Metadata access and usage
  • Metadata control and management

A Metadata Management system must be capable of bringing together Metadata from many different sources. Systems will differ depending on the degree of integration and the role of the integrating system in the maintenance of the Metadata.

A managed Metadata environment should isolate the end user from the various and disparate Metadata sources. The architecture should provide a single access point for required Metadata. Design of the architecture depends on the specific requirements of the organization. Three technical architectural approaches to building a common Metadata repository mirror the approaches to designing data warehouses:

  • Centralized: A centralized architecture consists of a single Metadata repository that contains copies of Metadata from the various sources. Organizations with limited IT resources, or those seeking to automate as much as possible, may choose to avoid this architecture option. Organizations seeking a high degree of consistency within the common Metadata repository can benefit from a centralized architecture.
  • Distributed: A completely distributed architecture maintains a single access point. The Metadata retrieval engine responds to user requests by retrieving data from source systems in real time; there is no persistent repository. In this architecture, the Metadata management environment maintains the necessary source system catalogs and lookup information needed to process user queries and searches effectively. A common object request broker or similar middleware protocol accesses these source systems.
  • Hybrid: A hybrid architecture combines characteristics of centralized and distributed architectures. Metadata still moves directly from the source systems into a centralized repository. However, the repository design only accounts for the user-added Metadata, the critical standardized items, and the additions from manual sources.

Implement a managed Metadata environment incrementally to minimize risks and facilitate acceptance. The repository contents should be generic in design. It should not merely reflect the source system database designs. Enterprise subject area experts should help create a comprehensive Metadata model for content. Planning should account for integrating Metadata so that data consumers can see across different data sources. The ability to do so will be one of the most valuable capabilities of the repository. It should house current, planned, and historical versions of the Metadata. Often, the first implementation is a pilot to prove concepts and learn about managing the Metadata environment.

Metadata quality

When managing the quality of Metadata, it is important to recognize that a lot of Metadata originates through existing processes. For example, the data modeling process produces table and column definitions and other Metadata essential to creating data models. To get high-quality Metadata, Metadata should be seen as a product of these processes, rather than as a byproduct of them.

Again, Metadata follows the data lifecycle (see Figure 26). Reliable Metadata starts with a plan and increases in value as it is used, maintained, and enhanced. Metadata sources, like the data model, source to target mapping documentation, ETL logs, and the like should be treated as data sources. They should put in place processes and controls to ensure they produce a reliable, usable data product.

All processes, systems, and data have a need for some level of meta-information; that is, some description of their component pieces and how they work. It is best to plan how to create or collect this information. In addition, as the process, system or data is used, this meta-information grows and changes. It needs to be maintained and enhanced. Use of Metadata often results in recognition of requirements for additional Metadata. For example, sales people using customer data from two different systems may need to know where the data originated in order to better understand their customers.

Several general principles of Metadata management describe the means to manage Metadata for quality:

  • Accountability: Recognize that Metadata is often produced through existing processes (data modeling, SDLC, business process definition) and hold process owners accountable for the quality of Metadata (both in its initial creation and its maintenance).
  • Standards: Set, enforce, and audit standards for Metadata to simplify integration and enable use.
  • Improvement: Create a feedback mechanism so that consumers can inform the Metadata Management team of Metadata that is incorrect or out-of-date.

Like other data, Metadata can be profiled and inspected for quality. Its maintenance should be scheduled or completed as an auditable part of project work.

Metadata governance

Moving from an unmanaged to a managed Metadata environment takes work and discipline. It is not easy to do, even if most people recognize the value of reliable Metadata. Organizational readiness is a major concern, as are methods for governance and control. A comprehensive Metadata approach requires that business and technology staff be able to work closely together in a cross-functional manner.

Metadata Management is a low priority in many organizations. An essential set of Metadata needs coordination and commitment in an organization. From a data management perspective, essential business Metadata includes data definitions, models, and architecture. Essential technical Metadata includes file and data set technical descriptions, job names, processing schedules, etc.

Organizations should determine their specific requirements for the management of the lifecycle of critical Metadata and establish governance processes to enable those requirements. It is recommended that formal roles and responsibilities be assigned to dedicated resources, especially in large or business critical areas. Metadata governance requires Metadata and controls, so the team charged with managing Metadata can test principles on the Metadata they create and use.

What you need to know

  • Metadata management is foundational to data management. You cannot manage data without Metadata.
  • Metadata is not an end in itself. It is a means by which an organization captures explicit knowledge about its data in order to minimize risk and enable value.
  • Most organizations do not manage their Metadata well and they pay the price in hidden costs; they increase the long-term cost of managing data by creating unnecessary rework (and with it, the risk of inconsistency) with each new project, as well as the operational costs of trying to locate and use data.
  • Metadata is data. It has a lifecycle and should be managed based on that lifecycle. Different types of Metadata will have different specific lifecycle requirements.
  • As the volume and velocity of data increase, the benefits of having reliable Metadata also increase.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset