CHAPTER 9: Data Quality in Practice

This chapter presents two case studies that illustrate the most important points of this book as they play out, often messily, in practice. Both feature the getting-in-front approach and the roles of data customers, data creators, a provocateur to get the ball rolling, quality managers and embedded data managers to keep the work moving, and leaders to make the approach and roles standard. The first case, describes AT&T’s work to improve financial performance and predictability for access billing. I’ve related technical aspects of the work before39 —here of course the focus is on the “who did what.” The second case describes Aera Energy’s work to develop common data definitions. I’ve also told some of the Aera story before.40 These cases have stood the test of time. Others would do well to emulate them.

Improving Access Bill Quality at AT&T

This case study picks up with Bob Pautke, manager of Access Financial Assurance, looking for better statistical tools to improve his team’s work. This is in the late 1980s, in the heady days after AT&T and local telephone companies had split apart. AT&T ran the long distance network, the telephone companies provided local service and “access” to long distance networks. In a typical call, the caller, say in New Jersey, dials her son in another state, say Colorado. The local New Jersey telephone company (e.g., telco) delivers her call to AT&T, which takes the call to Colorado. There, the local Colorado telco picks up the call and delivers the call to her son. The telco portions of the call are the aforementioned “access.”

Access was under enormous scrutiny. AT&T simply could not function if it couldn’t connect to customers. Access, at over $15B/year, was AT&T’s largest expense. It was a technical challenge and an important source of revenue to telcos. Further, the whole idea of splitting AT&T and the telcos in the first place was to ensure that AT&T competitors had fair access to customers. So access garnered enormous political, judicial, and regulatory attention.

Divestiture rolled out quickly—the massive task involved setting up the telcos, splitting up hundreds of thousands of employees and plant and equipment worth over a $100 billion, and getting up and running. Not surprisingly, providing service was the number one priority, and billing for those services a distant second (Note: from my perspective, billing was a secondary objective for both AT&T and other telephone companies prior to divestiture as well. They were, after all, regulated monopolies, with, surprisingly to some, strong service cultures. While not stated, the attitude seemed to me to be, “Provide high-quality service first, worry about collecting the money later”).

With a lot of money at stake, AT&T developed a bill verification and claims process. When it received an invoice for access, AT&T paid the bill. In parallel, it developed a “predicted bill,” using its own sources, and compared it with the original bill. If the discrepancy was large and in the telco’s favor, AT&T filed a claim for the perceived overbilling. If the telco agreed, it would rebate the difference. If it disagreed, the telco would reject the claim, providing additional evidence to support its rejection. The telco could even file “counterclaims” if, for example, it found evidence of errors in its original billing in AT&T’s favor. Finally, there were plenty of original bills, claims, and counterclaims for which the evidence was mixed. In some cases, AT&T and the telco lumped them together and reached a settlement.

Figure 9.1 depicts the end-to-end process. Overall, bill verification was time-consuming, expensive and yielded uncertain results. Still, it was easy enough to justify: both sides were simply protecting their interests.

Figure 9.1 Access Bill Verification at AT&T.

Improve bill verification

The obvious step to improve financial performance is to improve the predicted bill. Better predicted bills meant better claims, in turn leading to more rebates. Of course, better predicted bills required better data, so from time to time AT&T would cleanse its own data. For their part, telcos also saw the need for high-quality data and so conducted their own clean-ups.

Bill verification and efforts to clean up data to improve bill verification constitute hidden data factories and it is simple to see how easy it is to justify them.

It is important to note that in some ways bill verification “worked.” The number and dollar value of claims decreased. Rebates far exceeded the cost of obtaining them. And many in AT&T came to view rebates as “revenue.” After all, money coming in is important, no matter what the source.

Pautke was responsible for so-called “special access,” those involving private lines (a private line connects a single customer, often a large business that needs lots of service, to the telephone network). Though he couldn’t articulate it well, Pautke was dissatisfied. In particular, the private line business is dynamic: existing customers need more services, new customers need new services, businesses set up new locations, and all may need new features. A sort of dynamic equilibrium evolved—new issues were created each month, some were resolved, some lumped together and settled.

Provocateurs are important for many reasons—they may see the hidden data factory when others don’t, they may be dissatisfied with results, or they may simply wonder why something costs so darn much. What distinguished them is that they have the courage to speak up and push on. Companies are in their debt!

The statistical tools Pautke sought aimed to help him understand these issues more deeply. He reached out to his director, Scott Williamson, who in turn, reached out to me, at the Bell Laboratories Quality Assurance Center. Earlier I had studied network performance and was looking to try out statistical process control on the operation of the network. Young Huh, a specialist in product reliability, and later others, joined me in assisting Pautke.

The problem Pautke posed was improving predicted bills. The three of us quickly agreed on a different approach. Rather than figuring out better ways to clean up the data needed to create a predicted bill, we should focus on the process that created the data.

Traction

Interestingly, no one understood this process in any detail. People knew how systems interconnected, but they did not know what happened to the data. To find out, Pautke and his team conducted a small tracking study. To do so, they picked 20 new service orders and then followed the data involved at each step. Figure 9.2 depicts a portion of one tracked record. Note four anomalies, in red italics. The first two changes (from XYZ.1234 to XYZ-1234 and from 1 to A) involved re-formatting the data during step B. They discovered a number of small changes like this as they looked through the data. Some were annoying, but none appeared to impact invoices. The other two changes noted in Figure 9.2, were more substantial. The billing number and office numbers changed mid-process. These changed the meaning in the data and impacted the invoice. Pautke and his team did not expect to see such changes as the data winded its way along.

Figure 9.2 A portion of a tracked record. The process features five steps and the table features four, of dozens, of data attributes. Thus the billing code for this item of work, at step A is “1.” Finally, the entries in red represent unexpected changes in the data record.

Further, for each of the 20 new service orders, Pautke and his team found something that just didn’t look right. At this point no one fully understood the implications. Many people were dismissive. After all, what can you prove with 20 records? So Pautke and his team did not yet have real traction.

Williamson, the most senior AT&T manager, didn’t fully understand the implications either. But he knew early results could not be ignored and he made sure that Pautke and his Bell Labs support team had the political cover and resources to push on.

To gain traction, Pautke and his team had to demonstrate that their approach was a valid alternative. They did so by automating their tracking efforts to increase numbers and looking for patterns in the results.

Note that, having found something that troubled him, Pautke had not just spoken up, he had followed up, key to being a provocateur.

They started using time series and Pareto plots to gain insights into such questions. Figure 9.3 removed any doubt that the process performed extremely poorly. On average, only 40 percent of the data records made it all of the way through without error.

Figure 9.3 Predicted bill process performance, weeks 1-10.

They also sought to determine where errors occurred by making dozens of plots. Many, such as Figure 9.4, yielded no particular insight. But Figure 9.5 proved more fruitful. It shows that the vast majority of problems occurred in a relatively few attributes.

Figure 9.4 Process performance by service type, weeks 1-10.

Figure 9.5 Process performance by attribute, weeks 1-10.

Pautke and his team made many such plots. The one that proved most insightful was Figure 9.6, which showed that the vast majority of problems occurred on the interfaces between steps C and D. A few also occurred between steps D and E, but almost none between steps A and B and B and C.

Figure 9.6 Interfaces where errors occur, all attributes, weeks 1-10.

Next, combining the insight from Figure 9.6 with that revealed in Figure 9.5, they were able to precisely identify exactly where the problems occurred (Figure 9.7). This is critical! Rather than tens of thousands (maybe hundreds of thousands) of individual issues that had to be addressed one at a time, it became clear that eliminating a relatively few root causes would dramatically improve data quality.

Figure 9.7 Process performance by interface and attribute, weeks 1-10.

Pautke spent an inordinate amount of time explaining these results. As he did so, a critical mass of managers grew dissatisfied with bill verification. The “claims are revenue” mantra was replaced with a simple analogy to a gold mine. In the analogy, one party seeded a mine with the second party’s gold. The second party used its best technology and other means to recover that gold. But it could hardly count that gold as revenue. There was no way to come out ahead. Even if the second party recovered all its gold, it had expended enormous time and energy to do so.

Finally, Pautke had traction!

First real results

Traction aside, Pautke had not yet achieved a consensus on what to do next. Some opined that if AT&T could create an extremely high-quality predicted bill, that it ought to do away with telephone company billing (and AT&T bill verification) altogether. Others thought it more appropriate to share the tracking technique with telephone companies and help/train/insist/demand that they find and eliminate the root causes in their processes. Discussions continued for months and it is easy to see them as a bureaucratic waste of time. That conclusion misses the point—these discussions socialized a new technology and a new approach with hundreds of people.

As fate would have it, Pautke was based in Cincinnati, and AT&T and Cincinnati Bell (CB) shared a building. Many people were former colleagues and personal friends. The lunchtime discussion one day between Pautke and Cori Rothenbach turned to billing and he relayed his excitement about data tracking and growing frustration with bill verification. Rothenbach was just as frustrated. She found AT&T a difficult partner (perhaps putting it kindly!), sending far too much corrupt data and causing extra effort to set up and bill for CB services. At the same time, she was confident that CB’s data was outstanding.

Rothenbach talked to others in CB and, within a week or two, AT&T and CB agreed to track data across company lines.

A critical moment occurred during an early discussion. Pautke had stated that all AT&T wanted was a “timely and accurate bill.” The response from the Cincinnati Bell people stunned him: “You know Bob, we don’t come in every morning thinking of ways to foul you up. What exactly do you mean by ‘timely and accurate?’”

Pautke knew in an instant! Rather than explaining what it wanted, AT&T had plunged headlong into bill verification. It had failed in its responsibilities as a customer and it had only itself to blame for bad data coming from Cincinnati Bell. This subtle observation is extremely important and I’ll have more to say about it in a moment.

I cannot overemphasize the need for communications in data quality management. A major contributing factor in every data quality program I’ve work on was the data creators simply did not understand what data creators required. And, as Pautke found, data customers have only themselves to blame if they’ve not explained their needs to data creators.

To its credit, Cincinnati Bell only required a few tracked records to see that its data was not perfect. With its strong service ethic, this was simply unacceptable. AT&T, for its part, acknowledged CB’s complaints about its data.

So the two companies worked together, across company lines, tracking new customer orders from the receipt, through provisioning, through billing. They did find some errors that were solely one or the other’s fault, but most occurred on interfaces between the companies, the most important of which is between steps C and D in the prior discussion. At a deeper level, the real root cause was poor communication of each company’s needs as data crossed these interfaces. Defining those needs in full detail was the key to fixing the interfaces, in turn eliminating most billing errors.

Rollout

Even with the success of data tracking and the CB work, few AT&T managers thought it feasible to eliminate bill verification. After all, CB is a small company, not likely to impress the others. There was plenty of internal debate and discussion, but again no consensus on next steps emerged.

In another twist of fate, Scott Williamson, so influential in getting the work started and nurturing it along, took early retirement. His replacement, Monica Mehan, came from a different part of the company and had no investment in any aspect of the work. As she learned about her new organization, she found data tracking to be cool and the CB trial intriguing. Conversely, bill verification held no appeal. She thought the only responsible course involved trying to replicate the CB work with all telcos and she charged one of her lieutenants, John Tomka, with getting on with it.

Mehan’s decisiveness and sense of urgency were essential. As I noted earlier, data quality programs go as far and as fast as the senior leader perceived to be leading the effort demands. If Pautke had poked the bear, now the bear arose from its slumber.

For the new approach to work, other telcos would have to invest in data tracking and become willing partners. But would they do so? After all, telcos were monopoly providers, given, in the eyes of some, short shrift at divestiture. What was in it for them?

In one respect the CB trial was more prescient than one could have expected. Each telco hated the entire bill verification and claims process just as much as CB. To them, it was an expense, and a huge, unpredictable one at that. Actually, it was two expenses—the monies they returned to AT&T and the expense of responding to claims. They incurred this second expense even when an AT&T claim was wrong. Thus, at a high level, telco and AT&T interests aligned. Still high-level alignment and a clear way forward are two different things. Developing that clear way forward required a bit more work.

There are two types of billing errors: overcharges, where the telco bill is too high, and undercharges, where it is too low. To protect its financial interests, AT&T sought to eliminate overcharges; conversely, the telco sought to eliminate undercharges. Senior managers on both sides came to realize a simple reality: The only way to turn off bill verification was for each to recognize the other’s interests—and drive both overcharges and undercharges to zero.

Two specific measures were developed to crystallize these ideas:

  • Risk. The total dollars at risk to either company, estimated as the sum of overcharges and undercharges.
  • Consequence. The difference between overcharges and undercharges and represents the amount of money that should change hands.

To illustrate the points, consider two situations:

A: Overcharges = $1,000; Undercharges = $500

B: Overcharges = $2,000; Undercharges = $1,750.

Then the statistics work as:

A: Risk = $1,500; Consequence = $500

B: Risk = $3,750; Consequence = $250.

It’s clear enough that case B is better in the short term—fewer dollars change hands. At the same time, case A portends a better long-term future—there are considerably fewer dollars to worry about. The simple insight that both the short terms and long terms mattered crystallized a direction: Continue bill verification to keep the consequence low (serving companies’ financial interests) while simultaneously work together to drive risk to zero. Then turn off bill verification.

Tomka, Pautke and their teams wrapped up these ideas in a vision they labeled Future Optimal State or FOS. A claimless environment, in which companies trusted one another, the risk was low (if risk is low, then consequence must be low), and bill verification was no longer needed.

Earlier I noted that Pautke came to realize that AT&T was partly to blame for its access billing problems. Slowly, others in AT&T came to realize this as well. Before the work began, any manager asked, “Who’s to blame for telco billing issues?” would reply, “The telco!” without batting an eye. People came to realize that such thinking was incomplete, maybe even wrong. The practical reality is that when one party (e.g., a data customer) checks the data provided by a second party (e.g., a data creator), it tacitly assumes the quality of that party’s data. Even if needed in the short term, this is almost always a bad idea in the long term. AT&T came to realize that it had baked in a bad idea with bill verification.

AT&T had not been a good customer either. It wasn’t just CB that didn’t understand what AT&T meant by a “timely and accurate bill,” no telco did either (in retrospect, judging from the effort required to document that seemingly simple concept, I suspect that AT&T didn’t know either).

The FOS vision required both AT&T (as the customer) and telcos (as the creators) to assume their rightful roles. Thus AT&T would:

  • Define exactly what it wanted and help the telco understand.
  • Provide the telco with correct data so it would provision and bill correctly.
  • Replace bill verification with supplier management.

And the telco would:

  • Assume full responsibility for bill accuracy.
  • Provide evidence that bills were correct.
  • Identify and eliminate root causes of error as quickly as possible.

Tomka’s next step was to build the organizational structure needed to achieve FOS. He put the following in place:

  • A FOS Core Team, made up solely of AT&T people, responsible for the detailed definition of the FOS program, clarifying AT&T requirements, certifying that FOS objectives had been achieved, and overall program management.
  • Implementation teams, made up (largely) of telco employees, charged with translating AT&T requirements into a specific plan and implementing that plan. Each implementation team was given latitude in meeting AT&T requirements and a core team member stuck close, to answer questions and help make day-to-day decisions.
  • One steering committee per telco, composed of a core team member, the leader of the implementation team, and senior managers from both sides. These teams met quarterly and bore ultimate responsibility for successful implementation.

The core team serves as the data quality team, the implementation team as embedded data managers, and the steering committees as leadership.

The work proceeded at a steady pace. Figure 9.8 depicts key features of the new end-to-end cross-company process. Random samples of customer orders were tracked across company lines, bill-impacting issues noted, estimates of risk and consequence created and likely root causes called out. Dozens, maybe hundreds of issues came up. As with CB, most errors occurred on the interfaces of departments within companies or between companies. Almost all were relatively easy to address. Every single implementation team reduced risk by 90 percent with a few months of serious work.

Within two years, bill verification was turned off for all large telcos. Billing errors were reduced by 98 percent and both the phone companies and AT&T saved tens of millions per year. The unquantified, and probably unquantifiable, benefits may be even greater. Frank Ianna, then head of AT&T’s Network Operations Division, explained it to me this way, “In the past I’d get month-end summaries about three weeks after the end of each month. But they didn’t end anything, because results could change dramatically. For example, I couldn’t feel comfortable with January results until August. It’s no way to run the business. Don’t get me wrong. I do appreciate the improvements to the bottom line. But being able to run the business is worth even more.”

Finally, improving data quality changed everything, for everyone involved. Jobs were lost and every job was very different after! Just compare Figures 9.1 and 9.8. The new work looks nothing like the old.

As this case illustrates, improving data quality can be truly transformative in its own right.

Figure 9.8 New access financial assurance process.

Data Definitions That Stand the Test of Time at Aera Energy

While the AT&T example focuses on the quality of data values, this one focuses on data definitions, so important if data customers are to understand what the data means.

The basic roles are no less relevant for metadata. In particular, creating a data definition is similar to creating a data value and data customers need both to use data effectively. One company that’s done terrific work to define its most important data, keep those definitions current, and add new ones as needed is Aera Energy. Based in Bakersfield, Aera is an oil and gas company, focused on onshore and offshore exploration and production assets formerly operated by Shell and Mobil in California.

The key players include Gene Voiland, then Aera Chief Executive; David Walker, CIO; Bob Palermo, chief architect; Lwanga Yonke, information process owner; Marie Davis, Aera’s data architect; Dave Hay, a consultant in data modeling; and a network of 30 data stewards. Unlike most other companies, Aera and its executives recognized the importance of both data and process. So they defined 11 key processes, set on top of the usual business unit structure. “Manage information” was set squarely in the middle of the process structure, both feeding and depending on the other 10. Here Voiland, Walker, and Palermo play the leadership roles, Yonke data quality management, Davis and Hay provide technical expertise, and the data stewards play the role of embedded data managers. Interestingly, the data stewards played dual roles, amplifying the voice of the data customer and participating in the actual creation of data definitions. These stewards were appointed by the process owners and typically were business professionals.

Recall that most people are both data customers and data creators. Embedding data managers enables them to help with both roles.

Many people have no appreciation for all of the fuss about data definitions. After all, doesn’t everyone have access to a dictionary? But it turns out that everyday business language takes on meanings that go far beyond the dictionary. Terms can be vague and confusing—a term can follow one definition in one situation and quite another way in another. In Aera’s case, one term that is obviously important is “well” as in oil well. Clearly, it is a term that people in the oil business use every day. But what exactly is a “well”?

  • Is it a “hole in the ground”?
  • Is it a ”hole in the ground from which oil is being extracted”? If so, what is a “hole in the ground that has been capped off”?
  • What if, below ground level, a spur is drilled off the original hole, forming an inverted “Y.” One well or two?
  • What if, through the same hole, two distinct subsurface oil zones are being produced, each one with its own production and fluid measurement equipment? How many wells now?

Imagine the confusion that will result when two people employ different definitions. Then each will give different answers to that most basic of questions, “How many wells do we have?” These complications grow as applications exchange data.

The job of getting the right definitions involved two steps. The first, led by Palermo, consisted primarily of identifying 53 important terms (such as “well”) and sketching initial definitions. The second step, led by Yonke, was aimed at getting agreement on the meanings of these terms, crystallizing them into a conceptual data model, and developing logical data models to support new computer applications.

To be clear, all important business terms require clear definition. These must be captured and made available via a data dictionary. But not all terms require standardization. A much smaller number (53 in Aera’s case) will suffice.

Note that Yonke’s job was to manage the process, not develop specific definitions. Since the process spans the entire company, the job may be likened to herding cats. As he pointed out, “It’s a challenge to manage a process in which no one reports directly to you. You have to lead through influence and well-defined systems. While I was responsible for the overall process, the stewards were responsible for the individual results.”

It is important to recognize that it takes time to develop and agree upon good data definitions (in contrast, processes such as those Pautke worked on creating vast quantities of data values). But spending the time is worth the effort. Good data definitions have long shelf lives, since they cut to the heart of the company. They should outlast systems, reorganizations, even entire generations. Davis noted “As we worked through the process, individual teams came to realize that they had to do more than just represent the interests of their own business units. They had to do what was right for everyone in Aera. That took time. It also helped us appreciate each other’s perspectives and contributions a lot more.”

While creating solid data definitions (and of course maintaining them) is worth the time and effort, it is difficult to estimate return on investment. Still, Aera’s sustained implementation has yielded a wide array of benefits. Palermo noted “One of our goals was to enable engineers to spend more time on engineering analysis and decision-making, and less on data management. On that score we can show that we’ve doubled the productivity of most of these critical people.”

In Summary

What’s most striking to me about AT&T’s and Aera’s work and results is not how great they are, but how utterly typical they are. In the diffusion of the getting in front approach and the roles and methods that support it, these two companies make up the bleeding edge. Others, in telecommunications, oil and gas, retail, and finance have followed and achieved similar results. In contrast to the “bleeding edge,” these comprise the “leading edge.” They’ve proven that the approach works. And pays! Similar results available to all who devote serious effort to the task.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset