Chapter 19
IN THIS CHAPTER
Building a predictive analytics team
Setting the business objectives
Preparing your data
Sampling your data
Avoiding “garbage in, garbage out”
Creating quick victories
Fostering change in your organization
Building deployable models
Evaluating your model
Updating your model
This chapter discusses best practices in building predictive analytics models. You'll get a handle on the importance of defining the business objectives early on — and on getting the leaders of your business to champion your project.
To assemble your predictive analytics team, you'll need to recruit business analysts, data scientists, and information technologists. Regardless of their particular areas of expertise, your team members should be curious, engaged, motivated, and excited to dig as deep as necessary to make the project — and the business — succeed.
Business analysts serve as your domain experts (see Chapter 15): They provide the business-based perspective on which problems to solve — and give valuable insight on all business-related questions. Their experience and domain knowledge give them an intuitive savvy about what approaches might or might not work, on where to start and what to look at to get something going.
Hiring analytical team members who understand your line of business will help you focus the building of your predictive analytics solutions on the desired business outcomes.
Data scientists can play an important role linking together the worlds of business and data to the technology and algorithms while following well-established methodologies that are proven to be successful. They have a big say in developing the actual models and their views will affect the outcome of your whole project. This role will require expertise in statistics such as knowledge of regression/non-regression analysis and cluster analysis. (Regression analysis is a statistical method that investigates the relationships between variables.) The role also requires the ability to correctly choose the right technical solutions for the business problem and the ability to articulate the business value of the outcome to the stakeholders.
Your data scientists should possess knowledge of advanced algorithms and techniques such as machine learning, data mining, and natural language processing.
Then you need IT experts to apply technical expertise to the implementation, monitoring, maintenance, and administration of the needed IT systems. Their job is to make sure the IT infrastructure and all IT strategic assets are stable, secure, and available to enable the business mission. An example of this is making sure the computer network and database work smoothly together.
When data scientists have selected the appropriate techniques, then (together with IT experts) they can oversee the overall design of the system's architecture, and improve its performance in response to different environments and different volumes of data.
To give your predictive analytics project its best shot at success, be sure to set out specific business goals right from the start. Is the company adding to its product line? Targeting new customers? Changing its overall business model? Whatever the major focus is, pay particular attention to how your project will make a positive impact on the bottom line. This practical perspective will help you get your stakeholders to champion your project — which in turn generates the confidence you need to go forward.
In the early phase of the project, your analytics team should gather relevant business information by meeting with the stakeholders to understand and record their business needs — and their take on the issues that the project is expected to solve. The stakeholders' domain knowledge and firsthand insights can
This step in building your predictive analytics project is as crucial as it is unavoidably time-consuming and tedious: data preparation. The actual needed steps vary from one project to the next; they depend on the initial state of your data and the requirements for your project.
To ensure that you can accurately measure the performance of the predictive analytics model you're building, separate your historical business data into training and test datasets:
Splitting historical data into training and test datasets helps protect against overfitting the model to the training data. (See Chapter 15 for more about overfitting.) You want your model to identify true signals, patterns, and relationships, and to avoid any false ones that could be attributed to the noise within the data. The essence of overfitting is as follows: When a model is tuned to a specific dataset, there is a higher chance that any uncovered patterns are only true for that dataset; the same model may not perform as well on other datasets. Use your testing dataset to help eliminate these dataset-specific patterns (which are considered mostly noise), and your predictive model will become more accurate.
More data doesn't necessarily mean better data. A successful predictive analytics project requires, first and foremost, relevant and accurate data.
If you're trying to address a complex business decision, you may have to develop equally complex models. Keep in mind, however, that an overly complex model may degrade the quality of those precious predictions you're after, making them more ambiguous. The simpler you keep your model, the more control you have over the quality of the model's outputs.
Limiting the complexity of the model depends on knowing what variables to select before you even start building it — and that consideration leads right back to the people with domain knowledge. Your business experts are your best source for insights into what variables have direct impact on the business problem you're trying to solve. Also, you can decide empirically on what variables to include or exclude.
Use those insights to ensure that your training dataset includes most (if not all) the possible data that you expect to use to build the model.
To ensure high data quality as a factor in the success of the model you're building, data preparation and cleaning can be of enormous help. When you're examining your data, pay special attention to
An iterative approach to building the model — trying a version of the model, fine-tuning it in light of your results, and then trying the improved version — will allow you to evaluate the variables and algorithms used in your model, and choose those best suited to your final solution. Building your model iteratively may help you make some decisions and choices:
Impressive past performance doesn't guarantee an equally impressive future for an organization. It isn't enough to look at how the business has been done thus far. Instead, organizations should look at how predictive analytics can transform the way they're doing business in response to a rapidly changing present environment. For that to happen, business leaders need a major shift in the way they think and operate the business. Your predictive analytics project is a good place for them to start that shift.
Granted, the old guard — traditional business leaders who have been operating their businesses on gut feelings — can be close-minded at first, reluctant to adopt new technologies and trust the predictions and recommendations that come from them. You should expect some degree of organizational resistance to the deployment of your new model. This is especially true when an analytical system detects a major shift in trends — or a bigger crisis than anticipated — prompting the business leaders to distrust the system's recommendations and rely on historical analysis. If the business managers aren't willing to act on the recommendations of the predictive model, the project will fail.
When you've demonstrated that your analytics program can guide the organization effectively toward achieving its business goals, be sure you clearly communicate — and widely publicize — those results within the organization. The idea is to increase awareness and buy-in for the program. Educating stakeholders about the benefits of predictive analytics entails emphasizing the possible loss of both opportunities and competitive edge if this tool isn't developed and deployed. Maintaining focus on such business values can have a direct and positive impact on creating a cultural change that favors predictive analytics.
The process of educating and training may take time to bear fruit; most organizational changes require time to implement and to be adopted. Be sure you recruit business team members who have both an understanding of and experience in managing organizational change and developing internal communications strategy.
In order to ensure a successful deployment of the predictive model you're building, you'll need to think about deployment very early on. The business stakeholders should have a say in what the final model looks like. Thus, at the beginning of the project, be sure your team discusses the required accuracy of the intended model and how best to interpret its results.
Data modelers should understand the business objectives the model is trying to achieve, and all team members should be familiar with the metrics against which the model will be judged. The idea is to make sure everyone is on the same page, working to achieve the same goals, and using the same metrics to evaluate the benefits of the model.
Keep in mind that the model's operational environment will most likely be different from the development environment. The differences can be significant, from the hardware and software configurations, to the nature of the data, to the footprint of the model itself. The modelers have to know all the requirements needed for a successful deployment in production before they can build a model that will actually work on the production systems. Implementation constraints can become obstacles that come between the model and its deployment.
Understanding the limitations of your model is also critical to ensuring its success. Pay particular attention to these typical limitations:
Your goal, of course, is to build an analytical model that can actually solve the business objectives it was built for. Expect to spend some time evaluating the accuracy of your model's predictions so as to prove its value to the decision-making process — and to the bottom line.
Evaluate your model from these two distinct angles:
Successful deployment of the model in production is no time to relax. You'll need to closely monitor its accuracy and performance over time. A model tends to degrade over time (some faster than others); and a new infusion of energy is required from time to time to keep that model up and running. To stay successful, a model must be revisited and re-evaluated in light of new data and changing circumstances.
If conditions change so they no longer fit the model's original training, then you'll have to retrain the model to meet the new conditions. Such demanding new conditions include
Your strategic plan should include staying alert for any such emergent need to refresh your model and take it to the next level, but updating your model should be an ongoing process anyway. You'll keep on tweaking inputs and outputs, incorporating new data streams, retraining the model for the new conditions and continuously refining its outputs. Keep these goals in mind:
Automate the monitoring of your model by developing customized applications that report and track the model's performance.
Automation of monitoring, or having other team members involved, would alleviate any concerns a data scientist may have over the model’s performance and can improve the use of everyone’s time.
Automated monitoring saves time and helps you avoid errors in tracking the model's performance.