CHAPTER 3: Data Customers Must Speak Up

While I know of no hard facts to support this claim, it seems to me that the dominant reason so much data is bad is that people tolerate it. It’s remarkable, even paradoxical, that people will insist on the delivery of a complex product or service in perfect working order, but they will accept simple billing errors.15 Big corporations have large supply-chain management groups to ensure that their suppliers are up to spec, but their efforts don’t extend to data. This must change.

Unless you make your needs known, you have no reasonable expectation of quality data. Further, over the years I’ve worked on hundreds of data quality issues and so far, data creators not knowing what data customers want has been a major factor in each. In some cases, it’s far and away the dominant factor. Once the two sides sit down together, data creators might say, “Thanks for this. We never knew who used that data. The system wouldn’t let us move forward without it, so we always filled it in. But frankly, we just guessed. Now that we know what you need, we can do a much better job.”

I’ve seen plenty of other contributing factors–inadequate staffing, woefully ill-designed processes, and truly horrific data dictionaries. Even then explaining customer needs is essential. And it produces the fastest results.

Data customers simply must grow intolerant of bad data and speak up.

Becoming a good customer is increasingly important the more data you require or the more quickly you use data; as things speed up, there is simply less time to run through a hidden data factory to make corrections. It’s also increasingly important as one moves up the management chain. As the issues and opportunities become murkier, you need even better data, from more diverse sources, synthesized in creative ways.16

Yet as you rise through the ranks, getting what you need may be more difficult. Consider “bad news.” Let’s face it; people hate to bring the boss bad news. They delay, hoping things will turn around. They concoct far-fetched rationalizations for why things are not so bad. They deflect blame. So a boss must be clear about what she wants, understand the range of sources depended upon, and know whether they can be trusted.

Being a good customer is also of increasing importance to statisticians, analysts, data scientists, and anyone trying to find hidden treasures in the data. An algorithm combing through data doesn’t give a whit whether the data is correct. And it is completely indifferent to subtle differences in data definitions. So you can explore data to your heart’s content without giving data quality a second thought. But without high-quality data, you can’t do the real work, which involves understanding something new about the world and building that new understanding into a product or service.

The instructions for becoming a good data customer are quite straightforward:

  1. Recognize that you are a data customer.
  2. Communicate your needs.
  3. Innovate in your use of data and encourage data creators to innovate.
  4. Actively manage data suppliers.
  5. Make data factories more effective and as small as possible.
  6. Build the organizational capabilities needed to follow these instructions for all your important data.

This chapter considers each in turn.

Recognize That You Are a Data Customer

The first step in every addiction program is “Admit you have a problem.” For those not getting the high-quality data they need, that first step translates as becoming aware that you are a customer of data. And becoming a good one entails some hard work!

I use three definitions of data quality, depending on the circumstances. Two are especially pertinent in this chapter. My formal definition aligns closely with the near-term goal of improving business performance:17

Data is of high quality if it is fit for its intended use (by customers) in operations, analytics, decision-making, and planning. To be fit for use, data must be “free from defects” (i.e., “right”) and “possess desired features” (i.e., be the “right data”).18

It drives home the importance of customers and recognizes that data quality is a multi-dimensional beast, requiring both that the data be correct and clearly defined, and that it be relevant to the task at hand.

My aspirational definition aligns with the longer-term goal of building a future in data:19

Exactly the right data in exactly the right place and right time and in the right format to complete an operation, serve a customer, conduct an analysis, craft a plan, make a decision, or set and execute strategy.

This definition is especially helpful when imagining the organization unshackled from the constraints imposed by current realities.

To be a good customer, you have to build communication channels. Figure 3.1 represents this graphically. It depicts the customer-supplier model, modified slightly to emphasize these rights and responsibilities. As usually presented, one generally puts oneself (or one’s process, department, function, etc.) in the middle, with suppliers on the left and customers on the right. I’ve split the “your process” symbol in half, with the right half emphasizing your role as data customer and the left as data creator for the next person in line. The main left-to-right arrows represent the flow of data, inputs and outputs, respectively.

Figure 3.1 The customer-supplier model with minor modifications to reflect your responsibilities as a data customer, to build and maintain high-bandwidth communications (requirements and feedback) channels with your most important data suppliers.

Of special importance here are the communication channels, specifically your requirements and your feedback to your most important data suppliers. If these channels don’t exist, you must build them, make sure they operate effectively, expand them as necessary (I’ll call this “broadband communications”), and maintain them–forever.

In the next chapter, on your roles as data creator, I’ll urge you to take similar responsibilities for the communications channels with your customers.

Communicate Your Needs

Figure 3.2 presents a simple process that I’ve found enormously effective for developing a deeper understanding of your needs and requirements as a data customer, documenting them as the voice of the customer (VoC) and communicating them to suppliers.

I’m sure there are plenty of good methods to clarify your data needs. So if you have a method you use regularly, no need to learn this one. But if not, don’t hunt around. This process is almost guaranteed to yield a good result the first time you use it and excellent results as you gain more experience.

Figure 3.2 Customer Needs Analysis Process (customer version).

I’ll explain a couple of key instructions for using this process below. But first I wish to make clear that a list of counterexamples does not constitute a usable VoC. You have to explain, in sufficient detail, what quality data looks like, in the customer’s eyes.20

Distinguish needs from solutions, features, or requirements

I find it helpful in the short term, and most beneficial in the long term, for customers to distinguish their needs from their requirements.

It is almost always better to tell your data suppliers what you need, in contrast to telling them how to give you what you need. If they know what you want, many of them will think creatively about new ways to satisfy your needs.

The distinction may seem maddeningly subtle, so let me illustrate with an example. Those who drive family cars don’t have the in-depth technical knowledge to articulate specifications such as:

“We need the glass to be polarized, 0.17” thick, tempered for 40 hours in an 800 degree Thermaflex kiln, with a 26 percent blue-green tint, cut within 14 mils of specification, and with the edges sanded with 500 grit paper.”

Rather, they talk about what they need the glass to do, saying:

“We need to be able to see out in all kinds of weather. We want to be kept safe. We don’t want to be blinded by the sun. We need the door to sound solid when it closes.”

The second quote articulates needs; the first, the requirements of the windshield-production process. It’s the manufacturer’s job to sort out the requirements for producing and installing a windshield that meet your needs.

Thus, step 1 calls for you to clarify your data needs. Statements beginning with “I need to” – as opposed to “I need a – do this well. Examples of needs are:

  • “I need to track sales progress against plans on a weekly basis.”
  • “I need to set up corporate bonds the day before they go to market so we can sell them. I need all of the details that can impact pricing for a bond at that time.”
  • “I need to prepare the corporate report by the fifth business day after the end of each quarter, which means I’ll need to have each unit’s complete financials by day three.”

State your needs in your own words, providing as much color and detail as you can. After all, data creators are far more likely to do careful work when they understand the purpose than when they’re just filling out a form.

Translate your needs into the data and anything else you require

The next two steps go further. For example, “I need to track sales progress against plan” may mean that you need:

  1. The yearly plan, or target, for the number of units sold, for each SKU; or
  2. The actual number of units manufactured, assigned to distributors, and sold at the retail level for each SKU, every week.

Further, you may require that:

  1. A week starts at 12:01AM EST Monday and ends at midnight on Sunday. Note that the requirement defines a “week.” You need the numbers by 8:00AM on the first Wednesday after each week.
  2. Numbers provided must be within 3 percent of the actual units sold. If a number is missing, there should be a good explanation.
  3. You need a contact to discuss anything that you don’t understand.

This may seem like a lot of work (and it can be). It underscores why you should focus first on your most important needs.

Document, document, document

The older I get, the more insistent I become that your VoC be written (not oral). Figure 3.3 provides an outline. The document need not be long – I’ve seen some great VoCs that were only seven pages or so.

Figure 3.3 Generic outline for a Voice of the Customer document.

Sit down with data suppliers

The data you need might be created by many different groups and accessed through intermediaries. For example, you may access financial data created in the Widget department via a data warehouse (see Figure 3.4). You can’t expect the data warehouse group to do much about your accuracy requirements; nor can you expect the Widget department to do much about your access requirements. So the practical reality is that you’ll have to sit down with each (perhaps together, but probably not). You should discuss the entire VoC document and decompose the requirements into those that the data warehouse group and the Widget department can handle, respectively.

Figure 3.4 Having multiple data suppliers compels you to decompose your overall requirements.

As another example, you may well need the end-of-day price for a given equity. That data is created on an exchange and you obtain it via a contracted market-data provider. In this case you should generally expect the market-data provider to work with the data creator on your behalf.

To communicate with your data creators, you have to know who they are. That can be a challenge. You may simply look at a report, draw data from a data warehouse, or otherwise be unaware of its origin. A little detective work, starting from your source (or supplier) and working backward, should help you find the relevant data creators.

Note that neither the market-data provider nor the data warehouse is a data creator, even though they are your source or supplier. So I’ll use the term “data supplier” to mean the person or group through which you obtain the data you need. A data supplier can be a data creator, but not always.

It is not enough to send your VoC document to data creators and suppliers electronically. You must sort out who the right people are and meet them face-to-face. You must explain why you are reaching out, why they are so important to you, how you use the data that they create or supply on your behalf. You need to give them time to ask questions and think. And while you should be flexible in terms of how you work together, you should pointedly ask them to work with you. In particular, you should ask them to:

  1. Measure the quality of data they provide.
  2. Identify and eliminate root causes of error.
  3. Help shut down hidden data factories.

To gain traction, you don’t need to engage all of your data suppliers; one will do just fine. Mark yourself as having gained traction when that supplier accepts your requirements and completes one improvement project. And mark yourself as having achieved your first real result when that supplier has completed three or four – you should notice a real difference at that time. And you’ll have some real experience.

As you move to the next level, you should formalize this new way of working with data creators.

What if you can’t find data you know exists?

This situation comes up far too often, frustrating people and leading to all sorts of odd behavior. It usually arises because there are no requirements that data creators store what they create in an accessible manner or they maintain weak processes to ensure that they do so. It is another example of what happens when data customers are too tolerant. So if you and your team are spending too much time looking for and not always finding data you know is out there somewhere, follow the instructions given in this chapter: Spell out your needs, communicate them to the right supplier, and grow intolerant if you don’t see rapid progress.

Innovate and Encourage Innovation

A healthy organization asks itself new questions all of the time and it will need new data to answer them. And sometimes your needs are quite vague. Suppose you’re a senior executive, contemplating how you should acquire another company. Or you want data that will help you brainstorm new product ideas.

Use such new needs to spur innovation, both on your part and on the part of data creators. Follow the steps described here, emphasizing why you need what you need, providing as much context as you can. Give data creators the opportunity to meet your needs in ways that you couldn’t have imagined.

Using the acquisition example, you might say, “I need to figure out a fair price for acquiring the XYZ Company. They have a reputation for customer allegiance, which may play a real role in the valuation. I need to know more, but I’m not sure what to ask.” That’s a perfectly respectable request for data, and there are many ways to approach it. So don’t overspecify!

In almost all new situations, I find it best to simultaneously:

  • Cast a wide net by acquiring disparate data that may be loosely defined, or of suspect quality. I’m even interested in data that may not bear directly on the topic at hand.
  • Seek a smaller amount of carefully defined and created data that I feel certain I understand and can trust, even if it has other limitations.

It’s likely to get you what you need. At first, the new data will almost certainly have deficiencies—it may involve only a small sample, the data definitions may prove imprecise, or some measurements may look suspect. This is natural, so use this data fully aware of its shortcomings. And demand rapid improvements!

This don’t over specify instruction is also important when it comes to bad news, including poor results, new threats, previously unforeseen risks, and so forth. Make clear the sort of things you want to be kept informed about, while also making it clear that, since these things are new, unexpected, or unforeseen, you can’t possibly specify them in advance.

Actively Manage Both Internal and External Suppliers

As you gain experience in your role as a data customer, it is often appropriate to manage your data creators in a more formal, repeatable manner. Indeed, any external company with whom you contract for data should be actively managed.

Similarly, almost all important internal data creators should be managed formally. A corporate finance group, in particular, should manage all departments on whose data it depends in this way. If the data is important and coming from somewhere other than your team, you should probably actively manage the data creator (or supplier). Figure 3.5 presents my preferred means of doing so.

I find that almost all data creators respond well when engaged as called for here. The secret lies in thinking through what’s in it for them (not just you) and approaching them with that in mind. A few may be defensive, believing their quality is high. If so, don’t hesitate to show them a FAM, adding, “Here’s our view of how you’re doing.”

Figure 3.5 Data Creator (Supplier) Management Cycle.

So far, I’ve discussed step 2, communicating your requirements. A couple of further remarks here: The first step is to name a person or team to work with each specific supplier (an embedded data manager is a good choice to lead a part-time team) and clarify his, her, or the team’s responsibilities to improve supplier quality. “Halve the error rate in six months” is a good starting point. Don’t skip this step – clarity in expectations is critical. And don’t be afraid to try something new, gain some experience, and modify expectations.

Having done that, the obvious next step is to measure quality against those needs, then work with the data creator to identify and complete projects to close the gaps. You can, of course, measure performance and identify improvement projects on your own, but you can’t actually conduct those projects; the data creator must do that. So in most circumstances, I find it best that data creators take the lead role or, alternatively, that you complete the remaining steps in the cycle together.

Make Your Hidden Data Factories Explicit and Efficient

While your long-term goal is to make hidden data factories as small as possible, in the near term it is wise to make them effective. After all, you have to protect yourself.

Data factories represent non-value-added work

A needed first step may involve shining a light on hidden data factories. Figure 3.6 presents two versions of a simple two-step process. In the first version, both steps work well. In the other, department B must implement a hidden factory to accommodate errors created by department A, most of which are corrected, though some leak through to customers.

As your data needs grow and change, on occasion you’ll almost certainly need to experiment with new data from new suppliers of unknown quality. Initially, it may be impossible or not worth the effort to clarify your needs to these new data creators. After all, much of their data may turn out to be useless. In these circumstances, a hidden data factory can help you ensure that the new data is usable enough. That’s why it isn’t always possible, or even wise, to completely eliminate data factories.

The key observation is this: No fully-informed external customer would pay you more for the second version compared to the first. Said differently, the hidden data factory creates no value in the customer’s eyes – it is non-value-added work.

Figure 3.6 Two versions of a two-step process. No customer would pay extra for steps to correct data and make good on errors that leak through. Thus, they represent non-value-added work.

Now in the very near term, you probably have to continue to do this work. As I’ve noted, it is simply irresponsible to use bad data or pass it onto a customer. At the same time, all good managers know that, over time, they must reduce such work.

Implement better controls

Most hidden data factories don’t work all that well. Consider Samantha from earlier. Her assistant’s control method is to give the Widget department’s data a once-over and make any corrections that seem warranted. It’s better than nothing, but the assistant knows little about widgets – it was just dumb luck that he was able to make a correction. Perhaps worst of all, his informal system may provide a false sense of security that the data is correct.

Chances are your data quality controls are equally informal and riddled with shortcomings. Four types of controls, as listed in Table 3.1, can help you make your data factories more effective. Use the table to develop increasingly better controls.

Here are four ways to do just that:

  1. Implement customer-found error control. Samantha’s assistant Steve should revert every issue he finds back to the Widget department and ask it to make the correction, rather than fixing the data himself.
  2. Institute on-receipt control. Rather than looking at the data when Samantha needs it, her assistant could look at the data when it is received. This would give him more time to make a correction, should the data be flawed.
  3. Get the Widget department to assume responsibility for the control before it sends the data out (data suppliers can perform most controls just as easily as you).
  4. Automate the controls: Many controls, especially portions that identify errant data, should be automated (making corrections often requires human intervention).

Table 3.1 Commonly employed customer data quality controls.

Type of control

What it is

Comments

On-receipt validation controls

Employing proofreading or business rules. Identify and correct “invalid” data upon receipt from a data supplier.

Unless the data supplier is providing them, usually a good idea.

On-use validation controls

Employing proofreading or business rules. Identify and correct “invalid” data as a first step in using them.

Unless the data supplier is providing them, usually a good idea.

Clean-up controls

Usually employing business rules, identify and correct large quantities of invalid data, hopefully before they are used.

As a matter of best practice, should only be employed once the process of data creation works well (ensuring one-time only clean-up). There are exceptions based on business necessity. A “second-time clean-up” is indicative of an unhealthy data quality program.

Customer-Found Error Control

Correct errors that customers find.

Always. If customers are good enough to advise you of your errors, you should act quickly.

Build Organizational Capability

It is difficult and time consuming to sort out what you need, translate your needs into requirements, and work with data creators and suppliers to make sure they understand. You must assign people to the effort, as called for in Figure 3.7 and Table 3.2.

Figure 3.7 Build organizational capabilities to become a good data customer.

Table 3.2 Who does what for a good data customer.

Who

Responsibility

Head of data customer’s organization

Sets a tone and a change of direction in dealing with data suppliers.

Supplier Team

Oversees the data supplier management cycle.

Requirements Team

Develops and communicates the Voice of the Customer.

Measurement Team

Interprets measurements provided by data suppliers, helps others interpret them, and helps identify improvement projects (note: there is no formal improvement team).

Control Team

Brings the hidden data factory into the light, works to make the overall system of controls (including those conducted by the data supplier) more effective, and, as the data improves, works to shrink the hidden data factory. Responsible for the overall system of controls.

Embedded Data Manager

Assists with all of the work, leadership on some.

In Summary

If you need high-quality data to do your job, then you have to become a good customer, and as this chapter makes clear, doing so involves some hard work. You need to sort out what you need, translate your needs into requirements, and work with data creators and suppliers to make sure they understand. But there are no shortcuts!

Bear in mind that none of this is nearly as hard, or as time-consuming, as running a hidden data factory. The resulting improvements, to both the data and your network of data creators that know you and understand your needs, more than justify the effort.

Table 3.3 Data customer’s indicators of success.

You’ve

When

Gained Traction

You successfully communicated your Voice of the Customer to one data supplier and that supplier has made one real improvement.

Achieved Real Results

That supplier has completed several improvement projects such that you can see it in the data they send and can shrink a hidden data factory as a result.

Make it to the Next Level

You’re actively managing both internal and external suppliers of all of the most important data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset