Data is both an operational necessity and an asset. Effective data management can enable an organization to get more value from its data. Managing any asset requires working to get value from it, managing its lifecycle, and managing it across an enterprise. But the unique characteristics of data put a different spin on these functions. This chapter will cover the following concepts related to these challenges:
Data differs from other assets
Data has unique characteristics that make it different from other assets.3 Physical assets can be pointed to, touched, and moved around. Financial assets are accounted for on a balance sheet. But data is different. Data is not tangible. Yet it is durable; it does not wear out. Data is easy to copy and transport. But it is not easy to reproduce if it is lost or destroyed. Because it is not consumed when used, it can even be stolen without being gone. Data is dynamic and can be used for multiple purposes. The same data can even be used by multiple people at the same time – something that is impossible with physical or financial assets. Many uses of data beget more data.
These differences make it challenging simply to keep track of data, much less put a monetary value on data. Without this monetary value, it can be difficult to measure how data contributes to organizational success. These differences also raise other issues that affect data management, such as:
Data represents risk
Data not only represents value and opportunity; it also presents risks. Inaccurate, incomplete, or out-of-date data obviously represents risk because its information is not right. But data presents other risks, including:
The fact that data can be easily copied and replicated means it can be breached without being ‘gone’ from its rightful owners. Moreover, because data represents people, products, and money, legislators and regulators have recognized the potential uses and abuses of information and have put in place laws intended to mitigate obvious risks. For example:
Consumers are also more aware of how their data might be used. For example, when making purchases on a website, they expect not only smoother and more efficient operation of processes, but also protection of their information and respect for their privacy. Organizations that do not protect their customers’ data may not have those customers for long.
Poor quality data costs money
Ensuring that data is of high quality is central to data management. If data does not meet the needs of its consumers – if it is not ‘fit for purpose’ – then the effort to collect, store, secure, and enable access to it is wasted. To ensure data meets business needs, data management teams must work with data consumers to define the characteristics that make data of high quality.
Most uses of data involve learning from it in order to apply that learning and create value. For example, understanding customer habits in order to improve a product or service; assessing organizational performance or market trends, in order to develop a better business strategy, etc. Poor quality data will have a negative impact on these decisions.
As importantly, poor quality data is simply costly to any organization. Estimates differ, but experts think organizations spend between 10-30% of revenue handling data quality issues. IBM estimated the cost of poor quality data in the US in 2016 was $3.1 Trillion.4
Many of the costs of poor quality data are hidden and indirect and therefore hard to measure. Others, like fines, are direct and easy to calculate. Costs come from:
The corresponding benefits of high quality data include:
As these costs and benefits imply, managing data quality is not a one-time thing. Producing high-quality data requires planning, commitment, and a mindset that builds quality into processes and systems. All data management functions can influence the quality of data, for good or bad, so they all must account for data quality as they execute their work.
Data valuation is not standardized
Since each organization’s data is unique to itself, it can be difficult to put a monetary value on data. How much does it cost to collect and manage the history of a customer’s purchases? How much would it cost to reconstruct that history if the data were lost?
Still, putting monetary value on data is useful because it informs decisions about data and becomes the basis of understanding the value on data management activities.5 One approach to data valuation is to define general cost and benefit categories that can be applied consistently within an organization. Sample categories include:
Data asset valuation must also recognize that the value of data is contextual (i.e., what is of value to one organization may not be of value to another) and often temporal (i.e., what was valuable yesterday may not be valuable today). Despite this, within an organization, certain types of data, such as customer data, are likely to be consistently valuable over time, so most organizations focus first on ensuring the quality of this highly critical data.
Data management means managing data’s lifecycle
One reason people conflate data management with technology management is that they often see data only in one place: the application from which they access it. They do not recognize that data can be separate from the applications where it is created or stored and that data has a lifecycle. The data lifecycle is based on the product lifecycle. It focuses on ensuring that data is created, moved, and maintained in ways that make it is usable by the people and processes that require it. Even though data and technology are intertwined, the data lifecycle should not be confused with the systems development lifecycle (SDLC), which focuses on completing projects on time and within budget.
Conceptually, the data lifecycle is easy to describe (see Figure 4). It includes processes that create or obtain data, those that move, transform, and store it and enable it to be maintained and shared, and those that use or apply it, as well as those that dispose of it.6 Data is rarely static. Throughout its lifecycle, data may be cleansed, transformed, merged, enhanced, or aggregated. Data often moves horizontally within organization. As data is used or enhanced, new data is created, so the lifecycle has internal iterations and the ‘same’ data may have different lifecycle requirements in different parts of an organization.
Complexity is added to the concept of the data lifecycle by the fact that different kinds of data have different lifecycle requirements. For example, transactional data can be controlled largely through enforcement of basic rules, while Master Data requires curation. Still some principles apply to the lifecycle of any data:
Different kinds of data have different lifecycle requirements
Managing data is made more complicated by the fact that different types of data have different lifecycle management requirements. Data can be classified by the function it serves (e.g., transactional data, Reference Data, Master Data, Metadata; alternatively, category data, resource data, event data, detailed transaction data) or by content (e.g., data domains, subject areas) or by format or by the level of protection the data requires. Data can also be classified by how and where it is stored or accessed.
Because different types of data have different requirements, are associated with different risks, and play different roles within an organization, many of the tools of data management are focused on aspects of classification and control.10 For example, Master Data has different uses and consequently different management requirements than does transactional data.
Metadata must be managed as part of the data lifecycle
Data management professionals are passionate about Metadata because they realize how important it is. Yet it is a truism among them that one should never use the word Metadata when speaking with executives. “Their eyes will glaze over!” We’ll take that chance here because certain forms of Metadata are not simply critical to data management—they are essential to it. You cannot manage data without Metadata.
Metadata includes a range of information that allows people to understand data and the systems that contain data. Metadata describes what data an organization has, what it represents, how it is classified, where it came from, how it moves within the organization, how it evolves through use, who can and cannot use it, and whether it is of high quality.
The challenge is not only that you need Metadata to manage data, but that Metadata is a form of data and needs to be managed as such. Organizations that do not manage their data well generally do not manage their Metadata at all. The answer to this challenge is that Metadata management often provides a starting point for improvements in data management overall.
Data management is often confused with information technology management
Because almost all of today’s data is stored electronically, data management is closely linked with technology management. They need to be seen in relation to one another, because decisions about technology impact many facets of how data is managed. But data management, which focuses on ensuring that the data itself is usable and trustworthy, differs from technology management, which focuses on building and maintaining infrastructure, systems, and applications.
The two are fundamentally connected by the fact that these systems and applications often automate business processes that collect or create data and different technological choices will put different constraints on the data itself. Both data management and technology management requirements should be rooted in business processes that create or use data and the needs of the people and processes that consume data.
In many organizations there is ongoing tension between the drive to build new technology and the desire to have more reliable data – as if the two were opposed to each other instead of necessary to each other. Successful data management requires sound decisions about technology, but managing technology is not the same as managing data. Organizations need to understand the impact of technology on data, in order to prevent technological temptation from driving their decisions about data. Instead, data requirements aligned with business strategy should drive decisions about technology.
Data management requires a range of skills
Managing data involves a set of interconnected processes aligned with the data lifecycle. Though many organizations see data management as an information technology function, it actually requires a wide range of people with a diverse set of skills working in different parts of an organization. Data management is a complex process because it is executed throughout an organization. Data is managed in different places within an organization by teams that have responsibility for different phases of the data lifecycle. Data management requires:
The challenge is getting people with this range of skills and perspectives to recognize how the pieces fit together and how their work intersects with the work of other parts of the organization so that they successfully collaborate and achieve common goals.
Data management requires an enterprise perspective
The footprint of data management is as large as the organization that creates and uses data. Data is one of the ‘horizontals’ of an organization. It moves across verticals, such as sales, marketing, and operations. Or at least it should. Ideally, data should be managed from an enterprise perspective. However, getting to an enterprise perspective is challenging.
Most organizations break work down by business units or functions, each of which may develop its own applications to perform its work. Because data is often viewed simply as a by-product of operational processes (for example, sales transaction records are the by-product of the selling process, not an end in themselves), it is not always planned for beyond the immediate need. It may not even be recognized as something that other people and processes use.
Unless enterprise data standards are established and enforced, there will be differences in how data is defined and created in different areas. For example, take something as seemingly simple as a Social Security Number (SSN), a US identifier for individuals. If one application captures SSN as a numeric value and another captures it in a text field, SSN data will be formatted differently. This can result in problems like dropping leading zeros on SSNs. Formatting differences, differences in the granularity of data, and differences about which attributes are mandatory to capture – all of these differences present obstacles to integrating data from diverse applications. Obstacles to integration limit the value an organization can get from its data.
Organizations that view data as a product that they create or purchase will make better decisions about how to manage it throughout its lifecycle. These decisions require recognizing:
Planning for better data requires a strategic approach to architecture, modeling, and other design functions. It also depends on strategic collaboration between business and IT leadership. And, of course, it demands the ability to execute effectively on individual projects. The challenge is that there are usually organizational pressures, as well as the perennial pressures of time and money, that get in the way of better planning. Organizations must balance long- and short-term goals as they execute their strategy. Having clarity about the trade-offs leads to better decisions.
What you need to know