Chapter 5. Postmodel Workflow

This chapter will cover the following recipes:

  • K-fold cross validation
  • Automatic cross validation
  • Cross validation with ShuffleSplit
  • Stratified k-fold
  • Poor man's grid search
  • Brute force grid search
  • Using dummy estimators to compare results
  • Regression model evaluation
  • Feature selection
  • Feature selection on L1 norms
  • Persisting models with joblib

Introduction

Even though by design the chapters are unordered, you could argue by virtue of the art of data science, we've saved the best for last.

For the most part, each recipe within this chapter is applicable to the various models we've worked with. In some ways, you can think about this chapter as tuning the parameters and features. Ultimately, we need to choose some criteria to determine the "best" model. We'll use various measures to define best. This is covered in the Regression model evaluation recipe. Then in the Cross validation with ShuffleSplit recipe, we will randomize the evaluation across subsets of the data to help avoid overfitting.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset