Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 6
A Framework for Agile Analytics
A Simple Model for Gathering Insights

If I can’t picture it, I can’t understand it.

—Albert Einstein

At least in the software-development world, Agile methods are old hat today. Companies such as Amazon, Google, Facebook, Apple, Twitter, Microsoft, and countless others in the technology sector have long recognized the superiority of Scrum compared to the Waterfall method. Based on the success of these companies and the need to adapt quickly to a remarkably dynamic business environment, Agile methods have penetrated other industries.

Founded in 1892 in Schenectady, New York, General Electric (GE) is one of the oldest and most storied enterprises in the world. In a way, though, that doesn’t matter. As former executives at BlackBerry, Kodak, and Blockbuster can attest, previous success does not guarantee future success. To adapt to the realities of the twenty-first century, GE’s management recognized the need to get with the times—and, increasingly, this means adopting Agile practices, such as Scrum.

Consider Brad Surak, now GE Digital’s Chief Operating Officer (COO). Surak began his career as a software engineer. As such, he

was intimately familiar with Agile. He piloted Scrum with the leadership team responsible for developing industrial Internet applications and then, more recently, began applying it to the new unit’s management processes, such as operating reviews.¹

Although the notion of Agile analytics is relatively new, it is quickly gaining steam. As Part Three shows, organizations are using data and analytics to solve a wide variety of business problems. Before we arrive at proper case studies, we’ve got some work to do.

This brief chapter provides a simple and general framework for gleaning insights in an iterative or Agile fashion. The framework displayed in Figure 6.1 seeks to avoid the costly mistakes of Waterfall analytics projects.

Chart shows simple six-step framework as ‘perform business discovery’ points to ‘perform data discovery’, to ‘prepare data’, to ‘model data’, to ‘score and deploy’, to ‘evaluate and improve’, which again points to ‘perform business discovery’. — **Figure 6.1** A Simple Six-Step Framework for Agile Analytics

*Source:* *Model adapted from Alt-Simmons’ book* Agile by Design. *Figure created by Phil Simon.*

Even at a high level, the intent here should be obvious: You should not attempt to analyze every conceivable data source in one large batch. You’ll be much better served by completing a series of smaller batches. Ditto for spending months attempting to build the perfect model.

TIP

Don’t try to boil the ocean.

Let’s cover each of these steps in a fair amount of detail.

PERFORM BUSINESS DISCOVERY

Analytics doesn’t exist in a vacuum. Sure, at some intellectual level you may wonder why your customers are churning or you can’t accurately predict inventory levels at your company. Still, at this point, hopefully you are attempting to solve a real business problem, not conduct an interesting but largely academic exercise.

To that end, you should start with key questions such as the following:

What are we trying to achieve?
What behavior(s) are we trying to understand, influence, and/or predict?
What type of data would we need to address these issues?^*
Is the project even viable?
Does our organization possess the time, budget, and resources to undertake the project?
Is our organization committed to the project? Or will the project fade into the background as more important priorities crop up?
What happens if we don’t answer these questions? What if the project takes longer than expected?

At this point, members of your team may very well disagree about the answers to some of these questions. For instance, not everyone may concur about whether the project is even viable. Disagreement is healthy as long as it is respectful.

To assess the viability of any analytics endeavor, it’s wise to hold discovery workshops. These brainstorming sessions can flush out ideas. Perhaps Saul is a skeptic because he saw a similar organizational project fail five years ago. He doesn’t realize that new leadership, technologies, data sources, and business realities have changed the game. Maybe Penny is a Pollyanna because this is her first project and she just assumes that everyone will follow her lead.

Resist the urge to skip this step. Next, try to start with a testable hypothesis or working theory of why a problem is occurring. For instance:

Initial hypothesis: Customers are leaving because our products are too expensive.
Null hypothesis: Customers are not leaving because our products are too expensive.

Don’t worry about completely answering this question from the get-go. Remember that this is a cycle. You’ll have plenty of time to introduce additional hypotheses, variables, and data sources. Odds are that a single simple hypothesis won’t explain the entirety of the business problem that you’re addressing in this stage anyway.

PERFORM DATA DISCOVERY

If you’re of a certain age, as I am, you remember a much different data landscape 20 years ago. Across the board, individuals and companies accessed far less data when making decisions. In a way, this made decision making easier. For instance, employees didn’t have to worry about collecting and analyzing data from social networks because they didn’t exist. The World Wide Web was just getting started. You couldn’t answer as many questions as comprehensively as you can today—at least in theory.

Today, we have the opposite problem. The arrival of Big Data means that discovery has never been more important. Critical data-related questions include:

Where does the desired data “live”?
Is it even available?
Is it legal to use? Is it free to use?
Are we able to retrieve the data in a clean and usable format? Or do we need to scrape it using one of the tools mentioned earlier? (See “Getting the Data” in Chapter 2.)
Is use of the data restricted? (For instance, Twitter limits access to its firehouse. The company intentionally throttles users who attempt to access too much data, especially first-time users.)
Can you pay to circumvent those restrictions? How much?
How long will it take to access/acquire the data?
How old is our data? Has it aged well?
If the data exists inside of the enterprise, which organizations and departments own the data? Are they willing to share it with you? (Don’t assume that the answer is yes.)
Is the data complete, accurate, and deduplicated?

At this point, it’s wise to remember Douglas Hofstadter’s wonderfully recursive law: “It always takes longer than you expect, even when you take into account Hofstadter’s Law.”^* Avoid committing to overly aggressive timelines for analytics. Remember that perfect is the enemy of good.

Also, know going in that it’s unlikely that you’ll solve your problem via a single data source, no matter how robust or promising it is. Start with what you know, but expect surprises. If a finding doesn’t surprise you at some point, then you’re probably not looking hard enough. Recognize that you’re never going to get all the desired data.

Finally, it’s wise at this point to digest the data that you have unearthed for a little while. Remember spikes (which are discussed in Chapter 5). Yes, you can always restart your efforts, but you won’t recoup the time. Agile methods such as Scrum don’t include time machines.

PREPARE THE DATA

Odds are that your data will contain at least a few errors, inconsistencies, and omissions, especially at first. Data quality isn’t sexy, but it’s a really big deal. In fact, data preparation may take a great deal of time and prevent you from getting started in earnest. You’ll probably need to parse, scrub, collect, and manipulate some data. Consider the following example.

In one of my Enterprise Analytics classes, a group of my students agreed to help a local retail business analyze its data against industry benchmarks. (I call the small business A1A here.) My students thought that they would be receiving pristine data in a format to which they had become accustomed. In other words, they thought that A1A’s data would be transactional (i.e., long, not wide). Table 6.1 shows the expected format.

Table 6.1 Expected Client Data

Customer_ID	PurchDate	PurchAmt	ProductCode
1234	1/1/08	12.99	ABC
1234	1/19/08	14.99	DEF
1234	1/21/08	72.99	XYZ

Source: Phil Simon.

As Table 6.1 shows, each transaction exists as a proper record in a sales table. (This is the way that contemporary systems store transactional data.) Lamentably, my students learned that A1A kept its data in the antiquated format demonstrated in Figure 6.2.

Bar chart shows performance ranging from negative 25 to 75 percent. It shows the performance of Duncan, Jackson, and Padgett as 20 percent, negative 22 percent, and 55 percent. Data given in approximate. — **Figure 6.2** Player Extra Value

*Source:* *Data from Basketball-reference.com.*

Table 6.2 contains the same data as Table 6.1, but this isn’t a potato-po-tah-toe situation. Each table represents the data in a very different way. Note that storing data in this manner was much more common in the 1980s. (For more on this, see “How Much? Kryder’s Law” in Chapter 1.)

Table 6.2 Actual Client Data

Customer_ID	Purch Date1	Purch Amt1	Product Code1	Purch Date2	Purch Amt2	Product Code2
1234	1/1/08	12.99	ABC	1/19/08	14.99	DEF
1235	1/1/12	72.99	XYZ	1/19/08	14.99	DEF
1236	1/1/08	12.99	ABC	1/19/08	72.99	XYZ

Source: Phil Simon.

By way of background, A1A hired temps to manually enter its sales data in Microsoft Excel. Not unexpectedly, the temps lacked a background in system design and data management. As such, they kept adding new columns (technically, fields) to the spreadsheet. This may seem like an inconsequential difference, but from a data perspective, it most certainly was not. If a customer booked 200 sales over the years with A1A, then the spreadsheet would contain more than 600 different fields with no end in sight. While the current version of Excel supports more than 16,000 different fields,^* A1A’s data was, quite frankly, unwieldy.

My students had to wade through hundreds of columns to transform the data into a far more usable and current format. Transposing data is time consuming. What’s more, this was not the only issue related to the data’s structure. As a result of their discoveries, the students spent the majority of the semester rebuilding A1A’s database from scratch. They couldn’t get to what they considered the good stuff (read: the analytics) until the very end of the project.

When preparing data for analytics, ask yourself the following key questions:

Who or what generates the data? (Remember from Chapter 1 the burgeoning Internet of Things. A machine may generate the data, but that doesn’t mean that the data is completely accurate.)
If people are responsible for generating the data, are they trained in how to enter it properly? Was there turnover in the organization that could introduce inconsistencies and errors?
Is the data coming directly from the system of record or from another source, such as a data mart or data warehouse?
How is the data currently generated and has that ever changed?
How much data is generated?
What if the data is flawed or incomplete? What are the downsides?
Is certain data absolutely required to proceed? What types of proxies can we use if we are left with no choice?
How complex is the data?
How frequently is the data updated?

TIP

Often, heat maps, simple SQL statements, pivot tables, histograms, and basic descriptive statistics can provide valuable insights into the state of your data. A day of data preparation may save you six weeks’ time down the road.

MODEL THE DATA^*

Many professionals are afraid of building models, and some of my students are a little apprehensive as well. I understand the hesitation. After all, it sounds a little daunting. What happens if you get it wrong?

Here’s the rub: As George E. P. Box once said, “Essentially, all models are wrong, but some are useful.”

TIP

The question isn’t whether a model is completely accurate; no model is. The real question hinges on whether a model is useful.

At a high level, the goal of any model is to understand, describe, and/or predict an event. (See “Types of Analytics” in Chapter 3.) This holds true whether you are trying to predict customer churn, the most relevant search results, or, as we’ll see shortly, basketball outcomes.

Taking a step back, you want to know the following:

Which variables are important
The absolute and relative importance of these variables
Which variables ultimately don’t matter

In a business context, most models lead—or at least should lead—to specific actions designed to improve business outcomes. To this end, the model may focus on customers, prospects, employees, users, or partners.

ADVICE ON BUILDING MODELS

I’m fond of metaphors. For instance, IT projects are like landing planes. It’s best to slowly ease into a landing, not try to stop on a dime.

Along the same lines, I often analogize data and analytics endeavors to building houses. Let’s say that you hire an architect to build a duplex. As your house starts to become a reality, you decide that you don’t need a fourth bedroom after all. Remove the closet and voilà! It’s an office.

Most structural changes, however, aren’t feasible past a certain point. There’s no way to inexpensively turn that duplex into a ranch without starting from scratch. Keep the house analogy in mind when creating models.

Beyond that, ensure that your modeling software handles additional volumes and types of data. What’s more, your program of choice should accommodate additional complexity should you choose to add it. Different tools and programming languages are better suited for different data types and sizes than others.

The Power of a Simple Model

Many books tackle building models, and I won’t attempt to summarize them here. (Remember, this book emphasizes breadth over depth.) For now, heed the following advice: It’s best to start simply, especially at first. Fight the urge to overcomplicate initial models. As the following anecdote illustrates, models need not be terribly sophisticated to bear fruit.

RELIVING THE 1997 NCAA TOURNAMENT

For decades, millions of Americans have filled out brackets for the National Collegiate Athletic Association (NCAA) Men’s College Basketball Tournament. In 1997, the World Wide Web was exploding and ESPN.com made “bracketology” easier than ever. For the first time, you could submit your brackets online and see how your picks compared against those of everyone else in the world. Oh, the technology!

At the time, I was finishing up my graduate degree at Cornell University. I didn’t know much about college basketball, but I knew a few things about data and building models. With graduation looming and my future full-time job secured, I wasn’t lacking for free time. (Ithaca, New York, is chilly in March, and I don’t ski.) I asked a bunch of my friends if they were interested in participating in a different type of NCAA pool.

Ten of them agreed, and on Wednesday, March 12, 1997, we drafted players in a serpentine order. (The 64-team tournament started on Thursday.) Under snake drafts, the person with the first pick selects and then waits until the end of the second round for his next slot—the 20th overall. He would then pick the 21st player and then not again until the 40th selection. The person with the second pick would have to wait until the 19th to go again and so forth.

Tournament Rules

Rather than attempt to predict which teams advanced in each round, I asked my friends if they wanted to kick in $15 each in a player-based tournament.

Players’ scores would represent the total of their points, rebounds, and assists. Players would accumulate points as long as their teams remained alive. If their teams lost, then they could no longer add to their teams’ totals because they no longer suited up. Whichever team had the highest combination of points at the end of the tournament would win the $150 prize. All points, rebounds, and assists counted as one point each. A 20-point scorer would count for as many points as a 10-point scorer who grabbed 10 rebounds per game.

Draft Strategy

In their quests to win the prize, my friends employed very qualitative approaches. That is, they picked the players whom they recognized, even if they played for teams expected to lose in the first round.

I did not.

I didn’t care if a player was fundamentally better than any other. I only cared about expected values. For instance, a star like Tim Duncan of Wake Forest (a number-three seed) was certainly valuable, but I only expected him to play three games. In the third round, Wake Forest would most likely play a number-two seed. (Although exceptions abound, higher-seeded teams have tended to beat lower-seeded ones throughout the tournament’s history.)

Getting the Data

I downloaded each player’s season statistics from ESPN.com and built a simple model in Microsoft Excel. I then started ranking players. For instance, let’s say that Duncan averaged 20 points, eight rebounds, and five assists per game over three tournament games. His expected value would be 99: [(20 + 8 + 5) × 3]. Duncan ultimately played only two tournament games that year as Stanford (a number-six seed) knocked off Wake Forest 72–66. Duncan only accounted for 92 points.^* Bawdy numbers, but hardly worthy of a top-three pick. (My model projected that he would account for 116.1 total points over three games, 26.1 percent lower than his final score.)

If this sounds complicated, I assure you that it wasn’t. It took me a little more than an hour to download the data, build my model, and project player expected values.

The Draft: My Model in Action

With my first-round pick I grabbed Bobby Jackson, a guard from the University of Minnesota. No surprise there. Jackson averaged 19.4 points, 7.4 rebounds, and three assists that year. Most important, the Golden Gophers earned a number-one seed in the Midwest Regional. I expected Jackson to play at least five games for me, and in the process, accrue a boatload of points.

In the fifth round, I selected Scott Padgett of Kentucky. Padgett was nowhere near as skilled as Jackson, never mind Duncan, a college stud and future NBA immortal. Still, Padgett’s stats were, put kindly, understated. In the 1996–1997 season, over 32 games, he averaged 9.6 points, 5.1 rebounds, and 1.8 assists (16.5 total). This was respectable, but hardly worthy of a first- or even third-round pick. Best of all, Padgett flew well under my friends’ radar.^* He should not have even been available when I snagged him in the fifth round.

I had my eye on Padgett, though, because that year Kentucky was ranked first in the West Regional. Number-one seeds are the most likely to advance to the Final Four. In so doing, they play five games minimum and more if they advance to the championship game. If Padgett only suited up for five games, my model predicted that he would net me 82.5 total points. (Remember that, all else being equal, more games equal more points.)

When I grabbed Padgett with my fifth-round pick, my friends chuckled. After all, Padgett was decent but hardly an elite player. No bother. I didn’t care about his objective basketball skills; I only cared about his projected points under my model. (For my part, I was silently laughing at their silly picks; they grabbed players with high name recognition but low expected values.)

Why I Will Buy Scott Padgett Dinner if I Ever Meet Him

It turns out that I had the last laugh with my friends. My remarkably simple model netted me the $150 prize.

In 1997, Kentucky lost to Arizona in the NCAA Championship. Padgett played in seven total games. In the process, he netted me 129 points, nearly 56 percent more than what my model had predicted. (I didn’t even need his final-game contribution in the Arizona game; I had cinched my victory and all the other players on my friends’ teams had been eliminated.)

Padgett was my most valuable player not because of his total points, but because of what he cost me. (Remember, he was my fifth-round pick.) Figure 6.2 shows the performance of Duncan, Jackson, and Padgett relative to my model.

There are two morals of my little yarn. First, models need not be complicated to be effective. Why not start with Occam’s razor? Second, to succeed you still need to get a little lucky. As Branch Rickey once wrote, “Luck is the residue of hard work and design.”

Forecasting and the Human Factor

Up until now, this book has emphasized the decidedly nonhuman components of data and analytics. To be sure, awareness of data types and structures, the different types of analytics, and the framework discussed in this chapter are critical. Put differently, absent this knowledge it’s nearly impossible to attain any sustainable level of success with analytics. (Of course, there’s always dumb luck.)

Before proceeding, it’s critical to accentuate a decidedly human point. A note from the Introduction bears repeating: Data and analytics generally do not make decisions by themselves. We human beings do, and we can act in a deliberate way that will maximize our odds of success. (Part Three shows how much leadership and openness to analytics drive successful outcomes.)

Philip Tetlock is an Annenberg University Professor at the University of Pennsylvania. For 20 years, he studied the accuracy of thousands of forecasts from hundreds of experts in dozens of fields. His 2005 book Expert Political Judgment examined why these alleged “experts” so frequently made wildly inaccurate predictions in just about every field.^*

In 2011, Tetlock began The Good Judgment Project along with Barbara Mellers and Don Moore. The multiyear endeavor aimed to forecast world events via the wisdom of the crowd. Think of them as forecasting tournaments with mystifying results: Predictions from nonexperts were “reportedly 30 percent better than intelligence officers with access to actual classified information.”²

Understanding Superforecasters

Intrigued by this discovery, Tetlock and Dan Gardner wrote a 2015 follow-up book, Superforecasting: The Art and Science of Prediction. Tetlock wanted to know why a very small percentage of people routinely performed exceptionally well, even in areas in which they lacked any previous knowledge. He called this group of people superforecasters.

Table 6.3 shows some of the differences between superforecasters and their regular brethren.

Table 6.3 Regular Forecasters versus Superforecasters

Regular Forecasters	Superforecasters
Myopic and provincial. They tend to start with an inside view and rarely look outside. These folks generally can’t get away from their own predispositions and attachments.	Ignorant in a good way. They tend to start with an outside view and slowly adopt an inside view. That is, they look heavily at external data sources, news articles, and crowd indicators, especially as starting points.
Lazy. They doubt that there is interesting data lying around.	Stubborn. They believe strongly that there is interesting data lying around, even if they can’t quickly find it.
Tend to rely on informed hunches and make the data conform to those hunches.	Engage in active, open-minded thinking. They go wherever the data takes them, even if it contradicts their preexisting beliefs.
Tend to believe in fate.	Tend to reject fate and understand that someone has to win. Examples here include lotteries, markets, poker tournaments, and so on.

Source: Principles from Superforecasting: The Art and Science of Prediction by Philip Tetlock and Dan Gardner. Table from Phil Simon.

Brass tacks: Let’s say that you need to solve a problem of relative sophistication. You give the same data to groups of superforecasters and vanilla experts. As a result of their mind-sets, the former is far more likely to produce a superior solution than the latter. To paraphrase Isaiah Berlin’s essay, foxes are better than hedgehogs.

SCORE AND DEPLOY

My NCAA Tournament model described earlier was simple on two levels: First, it didn’t require anywhere near the data or statistical techniques required to predict a complex business, economic, or medical outcome. Second, I was testing a discrete event, not an ongoing process. That is, my model had no use beyond March Madness in 1997, although I could have refined it over time.

The vast majority of business models couldn’t be more different than my little—albeit effective—Excel spreadsheet. Because these forecasts attempt to describe and predict ongoing events of far greater complexity, they need to evolve over time. Customer churn, employee attrition, and credit-card delinquency rates don’t end when a team wins a trophy and cuts down the net.

The score-and-deploy phase begins the process of assessing the viability of the model. Questions may well include:

Is your model working well? How well and how do you really know?
Are you measuring what you sought to measure?
Even if you’re looking at the right (independent) variables, are their weights appropriate?
How confident are you in your predictions?
Knowing that you’ll never reach complete accuracy, what is an acceptable level of uncertainty?

It’s important to note that you might not have access to a pristine and comprehensive dataset to run through your model. At some point, odds are that you will have to decide between including a smaller but cleaner dataset and a larger but impure one. Finally, keep an eye out for tactical and operational issues. If people aren’t adhering to your model’s recommendations for whatever reason, it will ultimately suffer.

EVALUATE AND IMPROVE

You’ve developed the first or fifth iteration of your model and want to see if it’s describing or predicting what you expected. It’s now time to audit your model. At a high level, model updates take one of the following three forms:

Simple data refresh: In this case, you replace a model’s existing dataset with a different one. The new dataset may include newer records, older ones, or a combination of both.
Model update: This is a complete or partial rebuild. (This may entail new variables and associated weights.)
Combination: This method fuses the first two. That is, you significantly alter the model and run a different dataset through it.

Again, it’s hard to promulgate absolute rules here because certain events are much harder to explain—let alone predict—than others. A model that explains 20 percent of the variance of a complex issue might exceed expectations, while one that explains 65 percent may be woefully inadequate.

Questions here typically include:

What data sources are you missing? Which ones are worth including?
Which data sources may dry up? What will you do if that happens? (It’s a mistake to assume that a source will be freely available forever just because this is the case today.)
Which data sources should you retire?
Which weights need adjusting? By how much?
Will your model improve or diminish if you make fundamental changes?
What are the time implications?
What happens if you make a big mistake? (A major risk to a company’s core product or service is very different than one to some “moonshot.”)

I don’t know the answers to questions such as these for your organization’s specific problems. Regardless of what you’re trying to achieve, though, it’s imperative to regularly review your models for efficacy. Put differently, disavow yourself of the “set-it-and-forget-it” mind-set. As described in the Introduction (see “Analytics and the Need for Speed”), the world changes faster than ever today. At a bare minimum, complacent companies risk missing big opportunities. In the extreme, they may become obsolete. Before leaving his post as CEO at Cisco Systems, John Chambers gave a keynote speech in which he predicted that 40 percent of today’s companies will “not exist in a meaningful way in 10 years.”³

TIP

After completing the cycle, it’s time to repeat it. Ideally, you have developed better questions than you asked the first time and even a few answers.

CHAPTER REVIEW AND DISCUSSION QUESTIONS

What are the six steps in the framework for Agile analytics?
Why is it essential to complete every step in the framework?
Why is business discovery so essential?
Do models need to be complicated to be effective? Why or why not?
Are experts particularly adept at making accurate predictions? Why or why not?
What are the personality characteristics that make for better forecasting?

Part Two has covered the essentials of analytics and one specific Agile method: Scrum. It also provided a general framework for performing analytics. It’s now time to move from theory to practice.

Part Three details a number of organizations’ efforts to make sense of data and deploy analytics. Yes, it’s case-study time. As we’ll soon see, with analytics, moving from theory to practice is often easier said than done.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
Chapter 6 A Framework for Agile Analytics A Simple Model for Gathering Insights

CHAPTER 6
A Framework for Agile Analytics
A Simple Model for Gathering Insights

TIP

PERFORM BUSINESS DISCOVERY

PERFORM DATA DISCOVERY

PREPARE THE DATA

TIP

MODEL THE DATA^*

TIP

The Power of a Simple Model

Forecasting and the Human Factor

Understanding Superforecasters

SCORE AND DEPLOY

EVALUATE AND IMPROVE

TIP

CHAPTER REVIEW AND DISCUSSION QUESTIONS

NEXT

NOTES

Table of Contents for Chapter 6 A Framework for Agile Analytics A Simple Model for Gathering Insights

Create new playlist

Sign In

Sign Up

TIP

PERFORM BUSINESS DISCOVERY

PERFORM DATA DISCOVERY

PREPARE THE DATA

TIP

MODEL THE DATA*

TIP

The Power of a Simple Model

Forecasting and the Human Factor

Understanding Superforecasters

SCORE AND DEPLOY

EVALUATE AND IMPROVE

TIP

CHAPTER REVIEW AND DISCUSSION QUESTIONS

NEXT

NOTES

Table of Contents for
Chapter 6 A Framework for Agile Analytics A Simple Model for Gathering Insights

MODEL THE DATA^*