Chapter 2
Data Management Challenges

Data is both an operational necessity and an asset. Effective data management can enable an organization to get more value from its data. Managing any asset requires working to get value from it, managing its lifecycle, and managing it across an enterprise. But the unique characteristics of data put a different spin on these functions. This chapter will cover the following concepts related to these challenges:

  • Managing data as an asset
    • Data differs from other assets
    • Data represents risk
    • Poor quality data costs time and money
    • Data valuation is not standardized
  • Managing the data lifecycle
    • Data management includes managing data’s lifecycle
    • Different kinds of data have different lifecycle requirements
    • Metadata must be managed as part of the data lifecycle
  • Managing data across an enterprise
    • Data management is often confused with information technology management
    • Data management is cross-functional and requires a range of skills
    • Data management requires an enterprise perspective and leadership commitment

Data differs from other assets

Data has unique characteristics that make it different from other assets.3 Physical assets can be pointed to, touched, and moved around. Financial assets are accounted for on a balance sheet. But data is different. Data is not tangible. Yet it is durable; it does not wear out. Data is easy to copy and transport. But it is not easy to reproduce if it is lost or destroyed. Because it is not consumed when used, it can even be stolen without being gone. Data is dynamic and can be used for multiple purposes. The same data can even be used by multiple people at the same time – something that is impossible with physical or financial assets. Many uses of data beget more data.

These differences make it challenging simply to keep track of data, much less put a monetary value on data. Without this monetary value, it can be difficult to measure how data contributes to organizational success. These differences also raise other issues that affect data management, such as:

  • Inventorying how much data an organization has
  • Defining data ownership and accountability
  • Protecting against the misuse of data
  • Managing risks associated with data
  • Defining and enforcing quality standards for data

Data represents risk

Data not only represents value and opportunity; it also presents risks. Inaccurate, incomplete, or out-of-date data obviously represents risk because its information is not right. But data presents other risks, including:

  • Misuse: If data consumers do not have sufficient and correct information (Metadata) about the data they use, then there is a risk of data being misused or misunderstood.
  • Unreliability: If data quality and reliability have not been established through standards and measurements, then there is a risk that unreliable data will be used to make decisions.
  • Inappropriate use: If data is not protected and secured, then there is a risk that data will be used by unauthorized people for unauthorized purposes.

The fact that data can be easily copied and replicated means it can be breached without being ‘gone’ from its rightful owners. Moreover, because data represents people, products, and money, legislators and regulators have recognized the potential uses and abuses of information and have put in place laws intended to mitigate obvious risks. For example:

  • Sarbanes-Oxley in the US focuses on controls over accuracy and validity of financial transaction data from transaction to balance sheet
  • Solvency II in the EU focuses on data lineage and the quality of data underpinning risk models and capital adequacy in the insurance sector
  • Throughout the world, data privacy regulations describe obligations toward the handling of personal identifying data (e.g., name, addresses, religious affiliation, or sexual orientation) and privacy (access or restriction to this information). Examples include:
    • Health Insurance Portability and Accountability Act (HIPPA) in the US
    • Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada
    • The General Data Protection Regulation (GDPR) in the EU

Consumers are also more aware of how their data might be used. For example, when making purchases on a website, they expect not only smoother and more efficient operation of processes, but also protection of their information and respect for their privacy. Organizations that do not protect their customers’ data may not have those customers for long.

Poor quality data costs money

Ensuring that data is of high quality is central to data management. If data does not meet the needs of its consumers – if it is not ‘fit for purpose’ – then the effort to collect, store, secure, and enable access to it is wasted. To ensure data meets business needs, data management teams must work with data consumers to define the characteristics that make data of high quality.

Most uses of data involve learning from it in order to apply that learning and create value. For example, understanding customer habits in order to improve a product or service; assessing organizational performance or market trends, in order to develop a better business strategy, etc. Poor quality data will have a negative impact on these decisions.

As importantly, poor quality data is simply costly to any organization. Estimates differ, but experts think organizations spend between 10-30% of revenue handling data quality issues. IBM estimated the cost of poor quality data in the US in 2016 was $3.1 Trillion.4

Many of the costs of poor quality data are hidden and indirect and therefore hard to measure. Others, like fines, are direct and easy to calculate. Costs come from:

  • Scrap and rework
  • Work-arounds and hidden correction processes
  • Organizational inefficiencies or low productivity
  • Organizational conflict
  • Low job satisfaction
  • Customer dissatisfaction
  • Opportunity costs, including the inability to innovate
  • Compliance costs or fines
  • Reputational and public relations costs

The corresponding benefits of high quality data include:

  • Improved customer experience
  • Higher productivity
  • Reduced risk
  • Ability to act on opportunities
  • Increased revenue
  • Competitive advantage gained from insights on customers, products, processes, and opportunities
  • Competitive advantage gained from demonstrable data security and data quality

As these costs and benefits imply, managing data quality is not a one-time thing. Producing high-quality data requires planning, commitment, and a mindset that builds quality into processes and systems. All data management functions can influence the quality of data, for good or bad, so they all must account for data quality as they execute their work.

Data valuation is not standardized

Since each organization’s data is unique to itself, it can be difficult to put a monetary value on data. How much does it cost to collect and manage the history of a customer’s purchases? How much would it cost to reconstruct that history if the data were lost?

Still, putting monetary value on data is useful because it informs decisions about data and becomes the basis of understanding the value on data management activities.5 One approach to data valuation is to define general cost and benefit categories that can be applied consistently within an organization. Sample categories include:

  • Cost of obtaining and storing data
  • Cost of replacing data if it were lost
  • Impact to the organization if data were missing
  • Potential costs of risks associated with data
  • Cost of risk mitigation
  • Cost of improving data
  • Benefits of higher quality data
  • What competitors would pay for data
  • What the data could be sold for
  • Expected revenue from innovative uses of data

Data asset valuation must also recognize that the value of data is contextual (i.e., what is of value to one organization may not be of value to another) and often temporal (i.e., what was valuable yesterday may not be valuable today). Despite this, within an organization, certain types of data, such as customer data, are likely to be consistently valuable over time, so most organizations focus first on ensuring the quality of this highly critical data.

Data management means managing data’s lifecycle

One reason people conflate data management with technology management is that they often see data only in one place: the application from which they access it. They do not recognize that data can be separate from the applications where it is created or stored and that data has a lifecycle. The data lifecycle is based on the product lifecycle. It focuses on ensuring that data is created, moved, and maintained in ways that make it is usable by the people and processes that require it. Even though data and technology are intertwined, the data lifecycle should not be confused with the systems development lifecycle (SDLC), which focuses on completing projects on time and within budget.

Conceptually, the data lifecycle is easy to describe (see Figure 4). It includes processes that create or obtain data, those that move, transform, and store it and enable it to be maintained and shared, and those that use or apply it, as well as those that dispose of it.6 Data is rarely static. Throughout its lifecycle, data may be cleansed, transformed, merged, enhanced, or aggregated. Data often moves horizontally within organization. As data is used or enhanced, new data is created, so the lifecycle has internal iterations and the ‘same’ data may have different lifecycle requirements in different parts of an organization.

Complexity is added to the concept of the data lifecycle by the fact that different kinds of data have different lifecycle requirements. For example, transactional data can be controlled largely through enforcement of basic rules, while Master Data requires curation. Still some principles apply to the lifecycle of any data:

  • Creation and usage are the most critical points in the data lifecycle7: Data management must be executed with an understanding of how data is produced, or obtained, as well as how data is used. It costs money to produce data. Data is valuable only when it is consumed or applied.
  • Data quality must be managed throughout the data lifecycle: Because the quality of data can be impacted by a range of lifecycle events, quality must be planned for as part of the data lifecycle. It is not an add-on, or something to be ‘done later.’
  • Metadata quality must be managed through the data lifecycle: Metadata is a type of data that is used to describe other data. As such, it is critical to all data management functions. Metadata is often created as part of the lifecycle of other data and should be seen as a product (rather than a by-product) of that lifecycle. Metadata quality must be managed in the same way as the quality of other data.
  • Data Security must be managed throughout the data lifecycle: Data management includes ensuring that data is secure and that risks associated with data are mitigated. Data that requires protection must be protected throughout its lifecycle, from creation to disposal.
  • Data management efforts should focus on the most critical data: Organizations produce a lot of data, much of which is never actually used.8 Trying to manage every piece of data is neither possible nor desirable. Lifecycle management requires focusing on an organization’s most critical data and minimizing data ROT (i.e., data that is redundant, obsolete, or trivial).9

Different kinds of data have different lifecycle requirements

Managing data is made more complicated by the fact that different types of data have different lifecycle management requirements. Data can be classified by the function it serves (e.g., transactional data, Reference Data, Master Data, Metadata; alternatively, category data, resource data, event data, detailed transaction data) or by content (e.g., data domains, subject areas) or by format or by the level of protection the data requires. Data can also be classified by how and where it is stored or accessed.

Because different types of data have different requirements, are associated with different risks, and play different roles within an organization, many of the tools of data management are focused on aspects of classification and control.10 For example, Master Data has different uses and consequently different management requirements than does transactional data.

Metadata must be managed as part of the data lifecycle

Data management professionals are passionate about Metadata because they realize how important it is. Yet it is a truism among them that one should never use the word Metadata when speaking with executives. “Their eyes will glaze over!” We’ll take that chance here because certain forms of Metadata are not simply critical to data management—they are essential to it. You cannot manage data without Metadata.

Metadata includes a range of information that allows people to understand data and the systems that contain data. Metadata describes what data an organization has, what it represents, how it is classified, where it came from, how it moves within the organization, how it evolves through use, who can and cannot use it, and whether it is of high quality.

The challenge is not only that you need Metadata to manage data, but that Metadata is a form of data and needs to be managed as such. Organizations that do not manage their data well generally do not manage their Metadata at all. The answer to this challenge is that Metadata management often provides a starting point for improvements in data management overall.

Data management is often confused with information technology management

Because almost all of today’s data is stored electronically, data management is closely linked with technology management. They need to be seen in relation to one another, because decisions about technology impact many facets of how data is managed. But data management, which focuses on ensuring that the data itself is usable and trustworthy, differs from technology management, which focuses on building and maintaining infrastructure, systems, and applications.

The two are fundamentally connected by the fact that these systems and applications often automate business processes that collect or create data and different technological choices will put different constraints on the data itself. Both data management and technology management requirements should be rooted in business processes that create or use data and the needs of the people and processes that consume data.

In many organizations there is ongoing tension between the drive to build new technology and the desire to have more reliable data – as if the two were opposed to each other instead of necessary to each other. Successful data management requires sound decisions about technology, but managing technology is not the same as managing data. Organizations need to understand the impact of technology on data, in order to prevent technological temptation from driving their decisions about data. Instead, data requirements aligned with business strategy should drive decisions about technology.

Data management requires a range of skills

Managing data involves a set of interconnected processes aligned with the data lifecycle. Though many organizations see data management as an information technology function, it actually requires a wide range of people with a diverse set of skills working in different parts of an organization. Data management is a complex process because it is executed throughout an organization. Data is managed in different places within an organization by teams that have responsibility for different phases of the data lifecycle. Data management requires:

  • Business process skills to understand and plan for the creation of reliable data
  • Design skills to plan for systems where data will be stored or used
  • Highly technical skills to administer hardware and build software where data is maintained
  • Data analysis skills to understand issues and problems discovered in data
  • Analytic skills to interpret data and apply it to new problems
  • Language skills to bring consensus to definitions and models so that people can understand data
  • Strategic thinking to see opportunities to use data to serve customers and meet goals

The challenge is getting people with this range of skills and perspectives to recognize how the pieces fit together and how their work intersects with the work of other parts of the organization so that they successfully collaborate and achieve common goals.

Data management requires an enterprise perspective

The footprint of data management is as large as the organization that creates and uses data. Data is one of the ‘horizontals’ of an organization. It moves across verticals, such as sales, marketing, and operations. Or at least it should. Ideally, data should be managed from an enterprise perspective. However, getting to an enterprise perspective is challenging.

Most organizations break work down by business units or functions, each of which may develop its own applications to perform its work. Because data is often viewed simply as a by-product of operational processes (for example, sales transaction records are the by-product of the selling process, not an end in themselves), it is not always planned for beyond the immediate need. It may not even be recognized as something that other people and processes use.

Unless enterprise data standards are established and enforced, there will be differences in how data is defined and created in different areas. For example, take something as seemingly simple as a Social Security Number (SSN), a US identifier for individuals. If one application captures SSN as a numeric value and another captures it in a text field, SSN data will be formatted differently. This can result in problems like dropping leading zeros on SSNs. Formatting differences, differences in the granularity of data, and differences about which attributes are mandatory to capture – all of these differences present obstacles to integrating data from diverse applications. Obstacles to integration limit the value an organization can get from its data.

Organizations that view data as a product that they create or purchase will make better decisions about how to manage it throughout its lifecycle. These decisions require recognizing:

  • The ways data connects business processes that might otherwise be seen as separate
  • The relationship between business processes and the technology that supports them
  • The design and architecture of systems and the data they produce and store
  • The ways data might be used to advance organizational strategy

Planning for better data requires a strategic approach to architecture, modeling, and other design functions. It also depends on strategic collaboration between business and IT leadership. And, of course, it demands the ability to execute effectively on individual projects. The challenge is that there are usually organizational pressures, as well as the perennial pressures of time and money, that get in the way of better planning. Organizations must balance long- and short-term goals as they execute their strategy. Having clarity about the trade-offs leads to better decisions.

What you need to know

  • Data is a valuable asset but also represents risk. An organization can begin to understand the value of its data by recognizing both the costs of poor quality data and the benefits of high-quality data.
  • Data has unique characteristics that make it challenging to manage.
  • The best approach to addressing these challenges is to manage data across its lifecycle and to take an enterprise perspective.
  • Failure to manage the data lifecycle is costly, though many costs are hidden.
  • Managing data across its lifecycle requires planning, skill, and teamwork.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset