CHAPTER 6
A Framework for Agile Analytics
A Simple Model for Gathering Insights

If I can’t picture it, I can’t understand it.

—Albert Einstein

At least in the software-development world, Agile methods are old hat today. Companies such as Amazon, Google, Facebook, Apple, Twitter, Microsoft, and countless others in the technology sector have long recognized the superiority of Scrum compared to the Waterfall method. Based on the success of these companies and the need to adapt quickly to a remarkably dynamic business environment, Agile methods have penetrated other industries.

Founded in 1892 in Schenectady, New York, General Electric (GE) is one of the oldest and most storied enterprises in the world. In a way, though, that doesn’t matter. As former executives at BlackBerry, Kodak, and Blockbuster can attest, previous success does not guarantee future success. To adapt to the realities of the twenty-first century, GE’s management recognized the need to get with the times—and, increasingly, this means adopting Agile practices, such as Scrum.

Consider Brad Surak, now GE Digital’s Chief Operating Officer (COO). Surak began his career as a software engineer. As such, he

was intimately familiar with Agile. He piloted Scrum with the leadership team responsible for developing industrial Internet applications and then, more recently, began applying it to the new unit’s management processes, such as operating reviews.1

Although the notion of Agile analytics is relatively new, it is quickly gaining steam. As Part Three shows, organizations are using data and analytics to solve a wide variety of business problems. Before we arrive at proper case studies, we’ve got some work to do.

This brief chapter provides a simple and general framework for gleaning insights in an iterative or Agile fashion. The framework displayed in Figure 6.1 seeks to avoid the costly mistakes of Waterfall analytics projects.

Chart shows simple six-step framework as ‘perform business discovery’ points to ‘perform data discovery’, to ‘prepare data’, to ‘model data’, to ‘score and deploy’, to ‘evaluate and improve’, which again points to ‘perform business discovery’.

Figure 6.1 A Simple Six-Step Framework for Agile Analytics

Source: Model adapted from Alt-Simmons’ book Agile by Design. Figure created by Phil Simon.

Even at a high level, the intent here should be obvious: You should not attempt to analyze every conceivable data source in one large batch. You’ll be much better served by completing a series of smaller batches. Ditto for spending months attempting to build the perfect model.

inline TIP

Don’t try to boil the ocean.

Let’s cover each of these steps in a fair amount of detail.

PERFORM BUSINESS DISCOVERY

Analytics doesn’t exist in a vacuum. Sure, at some intellectual level you may wonder why your customers are churning or you can’t accurately predict inventory levels at your company. Still, at this point, hopefully you are attempting to solve a real business problem, not conduct an interesting but largely academic exercise.

To that end, you should start with key questions such as the following:

  • What are we trying to achieve?
  • What behavior(s) are we trying to understand, influence, and/or predict?
  • What type of data would we need to address these issues?*
  • Is the project even viable?
  • Does our organization possess the time, budget, and resources to undertake the project?
  • Is our organization committed to the project? Or will the project fade into the background as more important priorities crop up?
  • What happens if we don’t answer these questions? What if the project takes longer than expected?

At this point, members of your team may very well disagree about the answers to some of these questions. For instance, not everyone may concur about whether the project is even viable. Disagreement is healthy as long as it is respectful.

To assess the viability of any analytics endeavor, it’s wise to hold discovery workshops. These brainstorming sessions can flush out ideas. Perhaps Saul is a skeptic because he saw a similar organizational project fail five years ago. He doesn’t realize that new leadership, technologies, data sources, and business realities have changed the game. Maybe Penny is a Pollyanna because this is her first project and she just assumes that everyone will follow her lead.

Resist the urge to skip this step. Next, try to start with a testable hypothesis or working theory of why a problem is occurring. For instance:

  • Initial hypothesis: Customers are leaving because our products are too expensive.
  • Null hypothesis: Customers are not leaving because our products are too expensive.

Don’t worry about completely answering this question from the get-go. Remember that this is a cycle. You’ll have plenty of time to introduce additional hypotheses, variables, and data sources. Odds are that a single simple hypothesis won’t explain the entirety of the business problem that you’re addressing in this stage anyway.

PERFORM DATA DISCOVERY

If you’re of a certain age, as I am, you remember a much different data landscape 20 years ago. Across the board, individuals and companies accessed far less data when making decisions. In a way, this made decision making easier. For instance, employees didn’t have to worry about collecting and analyzing data from social networks because they didn’t exist. The World Wide Web was just getting started. You couldn’t answer as many questions as comprehensively as you can today—at least in theory.

Today, we have the opposite problem. The arrival of Big Data means that discovery has never been more important. Critical data-related questions include:

  • Where does the desired data “live”?
  • Is it even available?
  • Is it legal to use? Is it free to use?
  • Are we able to retrieve the data in a clean and usable format? Or do we need to scrape it using one of the tools mentioned earlier? (See “Getting the Data” in Chapter 2.)
  • Is use of the data restricted? (For instance, Twitter limits access to its firehouse. The company intentionally throttles users who attempt to access too much data, especially first-time users.)
  • Can you pay to circumvent those restrictions? How much?
  • How long will it take to access/acquire the data?
  • How old is our data? Has it aged well?
  • If the data exists inside of the enterprise, which organizations and departments own the data? Are they willing to share it with you? (Don’t assume that the answer is yes.)
  • Is the data complete, accurate, and deduplicated?

At this point, it’s wise to remember Douglas Hofstadter’s wonderfully recursive law: “It always takes longer than you expect, even when you take into account Hofstadter’s Law.”* Avoid committing to overly aggressive timelines for analytics. Remember that perfect is the enemy of good.

Also, know going in that it’s unlikely that you’ll solve your problem via a single data source, no matter how robust or promising it is. Start with what you know, but expect surprises. If a finding doesn’t surprise you at some point, then you’re probably not looking hard enough. Recognize that you’re never going to get all the desired data.

Finally, it’s wise at this point to digest the data that you have unearthed for a little while. Remember spikes (which are discussed in Chapter 5). Yes, you can always restart your efforts, but you won’t recoup the time. Agile methods such as Scrum don’t include time machines.

PREPARE THE DATA

Odds are that your data will contain at least a few errors, inconsistencies, and omissions, especially at first. Data quality isn’t sexy, but it’s a really big deal. In fact, data preparation may take a great deal of time and prevent you from getting started in earnest. You’ll probably need to parse, scrub, collect, and manipulate some data. Consider the following example.

In one of my Enterprise Analytics classes, a group of my students agreed to help a local retail business analyze its data against industry benchmarks. (I call the small business A1A here.) My students thought that they would be receiving pristine data in a format to which they had become accustomed. In other words, they thought that A1A’s data would be transactional (i.e., long, not wide). Table 6.1 shows the expected format.

Table 6.1 Expected Client Data

Customer_ID PurchDate PurchAmt ProductCode
1234 1/1/08 12.99 ABC
1234 1/19/08 14.99 DEF
1234 1/21/08 72.99 XYZ

Source: Phil Simon.

As Table 6.1 shows, each transaction exists as a proper record in a sales table. (This is the way that contemporary systems store transactional data.) Lamentably, my students learned that A1A kept its data in the antiquated format demonstrated in Figure 6.2.

Bar chart shows performance ranging from negative 25 to 75 percent. It shows the performance of Duncan, Jackson, and Padgett as 20 percent, negative 22 percent, and 55 percent. Data given in approximate.

Figure 6.2 Player Extra Value

Source: Data from Basketball-reference.com.

Table 6.2 contains the same data as Table 6.1, but this isn’t a potato-po-tah-toe situation. Each table represents the data in a very different way. Note that storing data in this manner was much more common in the 1980s. (For more on this, see “How Much? Kryder’s Law” in Chapter 1.)

Table 6.2 Actual Client Data

Customer_ID Purch Date1 Purch Amt1 Product Code1 Purch Date2 Purch Amt2 Product Code2
1234 1/1/08 12.99 ABC 1/19/08 14.99 DEF
1235 1/1/12 72.99 XYZ 1/19/08 14.99 DEF
1236 1/1/08 12.99 ABC 1/19/08 72.99 XYZ

Source: Phil Simon.

By way of background, A1A hired temps to manually enter its sales data in Microsoft Excel. Not unexpectedly, the temps lacked a background in system design and data management. As such, they kept adding new columns (technically, fields) to the spreadsheet. This may seem like an inconsequential difference, but from a data perspective, it most certainly was not. If a customer booked 200 sales over the years with A1A, then the spreadsheet would contain more than 600 different fields with no end in sight. While the current version of Excel supports more than 16,000 different fields,* A1A’s data was, quite frankly, unwieldy.

My students had to wade through hundreds of columns to transform the data into a far more usable and current format. Transposing data is time consuming. What’s more, this was not the only issue related to the data’s structure. As a result of their discoveries, the students spent the majority of the semester rebuilding A1A’s database from scratch. They couldn’t get to what they considered the good stuff (read: the analytics) until the very end of the project.

When preparing data for analytics, ask yourself the following key questions:

  • Who or what generates the data? (Remember from Chapter 1 the burgeoning Internet of Things. A machine may generate the data, but that doesn’t mean that the data is completely accurate.)
  • If people are responsible for generating the data, are they trained in how to enter it properly? Was there turnover in the organization that could introduce inconsistencies and errors?
  • Is the data coming directly from the system of record or from another source, such as a data mart or data warehouse?
  • How is the data currently generated and has that ever changed?
  • How much data is generated?
  • What if the data is flawed or incomplete? What are the downsides?
  • Is certain data absolutely required to proceed? What types of proxies can we use if we are left with no choice?
  • How complex is the data?
  • How frequently is the data updated?

inline TIP

Often, heat maps, simple SQL statements, pivot tables, histograms, and basic descriptive statistics can provide valuable insights into the state of your data. A day of data preparation may save you six weeks’ time down the road.

MODEL THE DATA*

Many professionals are afraid of building models, and some of my students are a little apprehensive as well. I understand the hesitation. After all, it sounds a little daunting. What happens if you get it wrong?

Here’s the rub: As George E. P. Box once said, “Essentially, all models are wrong, but some are useful.”

inline TIP

The question isn’t whether a model is completely accurate; no model is. The real question hinges on whether a model is useful.

At a high level, the goal of any model is to understand, describe, and/or predict an event. (See “Types of Analytics” in Chapter 3.) This holds true whether you are trying to predict customer churn, the most relevant search results, or, as we’ll see shortly, basketball outcomes.

Taking a step back, you want to know the following:

  • Which variables are important
  • The absolute and relative importance of these variables
  • Which variables ultimately don’t matter

In a business context, most models lead—or at least should lead—to specific actions designed to improve business outcomes. To this end, the model may focus on customers, prospects, employees, users, or partners.

The Power of a Simple Model

Many books tackle building models, and I won’t attempt to summarize them here. (Remember, this book emphasizes breadth over depth.) For now, heed the following advice: It’s best to start simply, especially at first. Fight the urge to overcomplicate initial models. As the following anecdote illustrates, models need not be terribly sophisticated to bear fruit.

There are two morals of my little yarn. First, models need not be complicated to be effective. Why not start with Occam’s razor? Second, to succeed you still need to get a little lucky. As Branch Rickey once wrote, “Luck is the residue of hard work and design.”

Forecasting and the Human Factor

Up until now, this book has emphasized the decidedly nonhuman components of data and analytics. To be sure, awareness of data types and structures, the different types of analytics, and the framework discussed in this chapter are critical. Put differently, absent this knowledge it’s nearly impossible to attain any sustainable level of success with analytics. (Of course, there’s always dumb luck.)

Before proceeding, it’s critical to accentuate a decidedly human point. A note from the Introduction bears repeating: Data and analytics generally do not make decisions by themselves. We human beings do, and we can act in a deliberate way that will maximize our odds of success. (Part Three shows how much leadership and openness to analytics drive successful outcomes.)

Philip Tetlock is an Annenberg University Professor at the University of Pennsylvania. For 20 years, he studied the accuracy of thousands of forecasts from hundreds of experts in dozens of fields. His 2005 book Expert Political Judgment examined why these alleged “experts” so frequently made wildly inaccurate predictions in just about every field.*

In 2011, Tetlock began The Good Judgment Project along with Barbara Mellers and Don Moore. The multiyear endeavor aimed to forecast world events via the wisdom of the crowd. Think of them as forecasting tournaments with mystifying results: Predictions from nonexperts were “reportedly 30 percent better than intelligence officers with access to actual classified information.”2

Understanding Superforecasters

Intrigued by this discovery, Tetlock and Dan Gardner wrote a 2015 follow-up book, Superforecasting: The Art and Science of Prediction. Tetlock wanted to know why a very small percentage of people routinely performed exceptionally well, even in areas in which they lacked any previous knowledge. He called this group of people superforecasters.

Table 6.3 shows some of the differences between superforecasters and their regular brethren.

Table 6.3 Regular Forecasters versus Superforecasters

Regular Forecasters Superforecasters
Myopic and provincial. They tend to start with an inside view and rarely look outside. These folks generally can’t get away from their own predispositions and attachments. Ignorant in a good way. They tend to start with an outside view and slowly adopt an inside view. That is, they look heavily at external data sources, news articles, and crowd indicators, especially as starting points.
Lazy. They doubt that there is interesting data lying around. Stubborn. They believe strongly that there is interesting data lying around, even if they can’t quickly find it.
Tend to rely on informed hunches and make the data conform to those hunches. Engage in active, open-minded thinking. They go wherever the data takes them, even if it contradicts their preexisting beliefs.
Tend to believe in fate. Tend to reject fate and understand that someone has to win. Examples here include lotteries, markets, poker tournaments, and so on.

Source: Principles from Superforecasting: The Art and Science of Prediction by Philip Tetlock and Dan Gardner. Table from Phil Simon.

Brass tacks: Let’s say that you need to solve a problem of relative sophistication. You give the same data to groups of superforecasters and vanilla experts. As a result of their mind-sets, the former is far more likely to produce a superior solution than the latter. To paraphrase Isaiah Berlin’s essay, foxes are better than hedgehogs.

SCORE AND DEPLOY

My NCAA Tournament model described earlier was simple on two levels: First, it didn’t require anywhere near the data or statistical techniques required to predict a complex business, economic, or medical outcome. Second, I was testing a discrete event, not an ongoing process. That is, my model had no use beyond March Madness in 1997, although I could have refined it over time.

The vast majority of business models couldn’t be more different than my little—albeit effective—Excel spreadsheet. Because these forecasts attempt to describe and predict ongoing events of far greater complexity, they need to evolve over time. Customer churn, employee attrition, and credit-card delinquency rates don’t end when a team wins a trophy and cuts down the net.

The score-and-deploy phase begins the process of assessing the viability of the model. Questions may well include:

  • Is your model working well? How well and how do you really know?
  • Are you measuring what you sought to measure?
  • Even if you’re looking at the right (independent) variables, are their weights appropriate?
  • How confident are you in your predictions?
  • Knowing that you’ll never reach complete accuracy, what is an acceptable level of uncertainty?

It’s important to note that you might not have access to a pristine and comprehensive dataset to run through your model. At some point, odds are that you will have to decide between including a smaller but cleaner dataset and a larger but impure one. Finally, keep an eye out for tactical and operational issues. If people aren’t adhering to your model’s recommendations for whatever reason, it will ultimately suffer.

EVALUATE AND IMPROVE

You’ve developed the first or fifth iteration of your model and want to see if it’s describing or predicting what you expected. It’s now time to audit your model. At a high level, model updates take one of the following three forms:

  1. Simple data refresh: In this case, you replace a model’s existing dataset with a different one. The new dataset may include newer records, older ones, or a combination of both.
  2. Model update: This is a complete or partial rebuild. (This may entail new variables and associated weights.)
  3. Combination: This method fuses the first two. That is, you significantly alter the model and run a different dataset through it.

Again, it’s hard to promulgate absolute rules here because certain events are much harder to explain—let alone predict—than others. A model that explains 20 percent of the variance of a complex issue might exceed expectations, while one that explains 65 percent may be woefully inadequate.

Questions here typically include:

  • What data sources are you missing? Which ones are worth including?
  • Which data sources may dry up? What will you do if that happens? (It’s a mistake to assume that a source will be freely available forever just because this is the case today.)
  • Which data sources should you retire?
  • Which weights need adjusting? By how much?
  • Will your model improve or diminish if you make fundamental changes?
  • What are the time implications?
  • What happens if you make a big mistake? (A major risk to a company’s core product or service is very different than one to some “moonshot.”)

I don’t know the answers to questions such as these for your organization’s specific problems. Regardless of what you’re trying to achieve, though, it’s imperative to regularly review your models for efficacy. Put differently, disavow yourself of the “set-it-and-forget-it” mind-set. As described in the Introduction (see “Analytics and the Need for Speed”), the world changes faster than ever today. At a bare minimum, complacent companies risk missing big opportunities. In the extreme, they may become obsolete. Before leaving his post as CEO at Cisco Systems, John Chambers gave a keynote speech in which he predicted that 40 percent of today’s companies will “not exist in a meaningful way in 10 years.”3

inline TIP

After completing the cycle, it’s time to repeat it. Ideally, you have developed better questions than you asked the first time and even a few answers.

CHAPTER REVIEW AND DISCUSSION QUESTIONS

  • What are the six steps in the framework for Agile analytics?
  • Why is it essential to complete every step in the framework?
  • Why is business discovery so essential?
  • Do models need to be complicated to be effective? Why or why not?
  • Are experts particularly adept at making accurate predictions? Why or why not?
  • What are the personality characteristics that make for better forecasting?

NEXT

Part Two has covered the essentials of analytics and one specific Agile method: Scrum. It also provided a general framework for performing analytics. It’s now time to move from theory to practice.

Part Three details a number of organizations’ efforts to make sense of data and deploy analytics. Yes, it’s case-study time. As we’ll soon see, with analytics, moving from theory to practice is often easier said than done.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset