Chapter 2 – Classifying with scikit-learn Estimators

More complex pipelines

http://scikit-learn.org/stable/modules/pipeline.html#featureunion-composite-feature-spaces

The Pipelines we have used in the module follow a single stream—the output of one step is the input of another step.

Pipelines follow the transformer and estimator interfaces as well—this allows us to embed Pipelines within Pipelines. This is a useful construct for very complex models, but becomes very powerful when combined with Feature Unions, as shown in the preceding link.

This allows us to extract multiple types of features at a time and then combine them to form a single dataset. For more details, see the example at http://scikit-learn.org/stable/auto_examples/feature_stacker.html.

Comparing classifiers

There are lots of classifiers in scikit-learn that are ready to use. The one you choose for a particular task is going to be based on a variety of factors. You can compare the f1-score to see which method is better, and you can investigate the deviation of those scores to see if that result is statistically significant.

An important factor is that they are trained and tested on the same data—that is, the test set for one classifier is the test set for all classifiers. Our use of random states allows us to ensure this is the case—an important factor for replicating experiments.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset