What to Measure

It’s so hard to measure the things you really, really want. In Software Engineering: An Idea Whose Time Has Come and Gone? [DeM09] Tom DeMarco said, “Most things that really matter–honor, dignity, discipline, personality, grace under pressure, values, ethics, resourcefulness, loyalty, humor, kindness–aren’t measurable.” You’ll generally need to track proxy measurements for your ultimate goals. You’ll want proxies that tend to mirror success well, have enough variety to avoid blind spots, and give you early indication when things are going wonky.

Measuring Delight

You’d like to measure the impact that your work has on the lives of people or the finances of the organization. Michael J. Tardiff tweeted, “Completing requirements is not progress. Delivering delight to people who care about and value those outcomes: that’s progress.”[10] I love that sentiment, but how do you measure the delight of people using the system? By the number of people? By the sum of their delight? What about the people who are paying for the development so that they can make a profit off of that delight? Do you measure their delight in short-term profits? In long-term viability? And when people pay you to build a system while they continue to do whatever it is they do, is the delight in how well you anticipate their desires? Or in how little you interrupt their lives to do so?

You’d also like to measure progress in a way that proceeds somewhat smoothly, and helps you judge how far you’ve gone and how far you have to go. The delight may be zero for quite a while before something happens, something well after software development is finished, that triggers an incident of delight. Or you could measure the profit or cost savings from use of new functionality. These are trailing indicators and may take a while to accrue. They might also be affected by things outside your control. As steering indicators, they do poorly for early, small course corrections.

It’s reasonable to measure impact when modifying a system which has already achieved basic functionality. In particular, you might release small changes to a working application and measure the impact it has on usage and profit. Ask questions like “When we automatically bring up the customer’s call history in reverse chronological order, what impact does it have on the efficiency of handling a customer service call?” “When we add hooks to social media platforms, how often do users mention us to their friends?” “When we highlight a photo of the product, do people purchase it more often?” These are ways of checking progress on the impacts you wish to see.

There are caveats to be aware of when measuring impact. A reverse chronological listing might make some call handling more efficient but make the handling of other calls less efficient. It might even obscure some information vital to the correct handling of certain issues. Mentions on social media might or might not be beneficial to your purposes. What if people are using those links to make fun of your product? Purchase volume might be related to some impetus entirely unrelated to the change you made. Or, especially when people visit a site regularly, a change might draw attention for a little while, but mean little in the long term. And consider what purchases they might be foregoing to make the ones you’re measuring.

For such reasons, few organizations rely solely on impact measurements to judge the progress of work. It’s important to also check progress with measurements that are more directly related to the work, for quicker and more reliable feedback.

Measuring Effort

At the other end of the scale, measuring how much work we’ve put into something is almost the opposite of measuring delight. It certainly proceeds smoothly from the start but it’s disconnected from the results you want to see. That’s not to say such measurements are totally worthless. Sometimes, especially at the beginning of an endeavor, that’s all you have to give you some sense of progress.

The Effort So Far

Sidney looked up at Ryan. "We’ve explored several new-to-us technologies and think we’ve found some frameworks and libraries that will save us significant time. We’ve examined the current system to see what it supports. We’ve found some confusing things in there, but it’s not clear whether or not they’re actually being used. We’ve got a plan of attack and are moving forward."

Measuring the effort expended gives you little assurance that you’re making progress at all. This is the equivalent of measuring distance traveled through the ocean by the amount of fuel you’ve burned. Does the number of hours worked equate to progress? Perhaps it does for the goal of taking home a paycheck, but probably not for the project or organizational goals.

That’s a problem with traditional Earned Value calculations. Earned Value is a traditional project management tool for measuring progress. For simplicity, Earned Value calculations, at least in software development projects with no tangible deliveries along the way, presume that the plan, expressed as a series of tasks, will result in value. Therefore, the cost expended indicates progress along that plan. In effect, it presumes that the value of the work is equal to its budgeted cost. That’s not been true in my experience. Even measuring progress against plan is suspect if you can’t measure, or at least observe, the value of what you produce along the way. If the value is “all or nothing” and realized only on completion, then progress is not assured until then, no matter what Captain Ron says.

Agile teams often fall into the trap of treating Story Points (see Story Points) as their measure of progress. This suffers from the same problems as Earned Value calculations. Story points are merely an estimate of the time and effort it takes to do things. They can be useful for planning how long something will take, or how much work fits into a timespan, but not so much for measuring progress toward a goal. Doing something harder is not necessarily more valuable; it’s just more expensive. In any event, our ultimate goal is not that to have more Story Points.

So what can we measure instead?

Measuring Output

Effort produces output. Measuring output can be valuable depending on how you measure it. Lines of code is a measure of output that was justifiably rejected by the software development industry at large because it’s so meaningless. It’s easy to add lines of code that contribute nothing to progress.

What We've Produced

"We’re almost done with a minimal Automatic Call Distributor implementation," Sidney continued, "except it has no ’distributor’ logic. It works only for a single agent’s terminal. And only for a single customer."

"What use is that?" Ryan demanded.

Measuring progress by software component rarely makes sense, either. What value does a database schema have, by itself? How can you tell that the database schema is done if nothing is using it? Most useful functionality requires a number of components to collaborate. Working component by component often leaves gaps between them that isn’t discovered until you try to integrate them. And when integration is left until late, you have a false sense of progress. While you may have completed big pieces of work, there’s “undoneness” lurking between them that no one had noticed. Fixing that undoneness may require reworking some of those big pieces. This is where the saying, “We’re 90 percent done and just have the 10 percent that takes 90 percent of the time left to do,” comes from. Your progress meter is fooling you.

Measuring Functionality

When the pieces work together to give the desired functionality, then you’ve created potentially deliverable output. That’s a better stick in the ground. If you can say “the system allows you to communicate in English, but not in other languages,” then you have a measure of progress. We can’t, perhaps, calculate the percentage complete, since you don’t know how many languages there are to implement. And you don’t know what overlap exists in the functionality between one language and another. We could try another, say French, and get an estimate of the incremental cost of adding a language. Is it the same for all languages? Perhaps Hebrew, Arabic, and Chinese would have less commonality than other European languages based on the Roman alphabet.

What It Does

"The system connects to the major interfaces and shows those connections are working. We’re almost done with a ’Walking Skeleton’ implementation," Sidney replied. "The system recognizes a call on a single incoming line and forwards it to a single agent’s system. We’ve brought up a single customer’s CRM screen, but that doesn’t seem to add value to the implementation. We’re thinking that enabling the agent to search the CRM system for the customer is the next priority. In the meantime, we’ve got a minimal call routing integration from the phone system to the agent. Then we can add support for multiple agents and have a rudimentary working system."

"Oh, OK. That sounds like progress." Ryan smiled and left.

I like to measure slices of usable functionality as the unit of progress, confirmed by automated tests that check examples of the desired behavior. This is similar to Ron Jeffries’ Running Tested Features (RTF) metric[11], with the concept of feature being on the smaller side of the continuum.

Some projects are not about adding end-user functionality, though. If your project is to change an underlying technology, such as a database manager or a JavaScript framework, then you could use a measure of the slices of functionality that are currently working with the new technology. This works pretty well if you created automated tests of the functionality as we built the original. Then you can reuse the same tests, with modifications to interface with the new system, as your indicator of progress. If you didn’t, then now is the time to start.

“The best time to plant a tree was 20 years ago. The second best time is now.” – ascribed to a Chinese proverb

This is the rationale behind developing, User Story by User Story (see User Stories). Since a User Story is a thin slice of functionality, it can be tested to see if it works as desired or not. This leaves less room for things to be neglected and not completed. The functionality crosses components, and all the necessary components must work to the degree required by this small slice of functionality.

Reliably Measuring Functionality

Tracking our progress by what functions work is a great tool. Having a suite of automated tests that verify the functionality makes it easy to check that progress, and make sure we’re not slipping backward. Beware, though, of using optimistic measurements that don’t cover all the spaces where problems might hide.

Where does it work? Is this on a developer’s machine in a local sandbox? There might be some undoneness lurking that you’ll discover when the code is integrated with that of other developers. That’s the advantage of continuous integration–you discover such issues early and often in small bits. Does it work when installed to a fresh, new environment? This discovers dependencies on elements that are not version-controlled. Does it work in an environment that mimics production? How well does this environment mimic production? The idea is to eliminate ways in which things could seem more finished than they are.

Is the functionality ready to deliver to production? How well can you know if it’s ready? Unforeseen bugs represent hidden undoneness. What would be required to give you confidence to deliver the requested functionality to production? Actual use shines a light on a lot more places where undoneness can hide. If people are using it, it must be usable to some degree, and it must be meeting at least some of their needs. Production use will also generate more varied conditions of use, and that may uncover hidden bugs that testing didn’t.

When you deliver the working system, you’ve given the users of that system capabilities they didn’t have before. The outcomes are those derived from actual use of the capabilities you delivered. What do the users do that they couldn’t do before? In what ways have the things they could already do changed? These are the outcomes that matter from the perspective of the whole organization, or the organization and its surrounding context. Are people actually achieving the benefits that were intended when you decided to create this? Are they delighted by it?

Not all organizations are capable of frequent delivery, yet. For that matter, not all organizations want frequent delivery of changes into production. There is also value in stability. But delivering frequently, however, is a powerful way to track progress.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset