The scikit-learn pipeline

The main purpose of the scikit-learn pipeline is to assemble ML steps. This can be cross-validated to set various parameters. Scikit-learn provides a library of transformers that are used for preprocessing data (data cleaning), kernel approximation (expand), unsupervised dimensionality reduction (reduce), and feature extraction (generate). The pipeline contains a series of transformers with a final estimator.

The pipeline sequentially applies a list of transforms, followed by a final estimator. In the pipeline, the fit and transform methods are implemented during the intermediate steps. The fit method is implemented only at the end of pipeline operation by the final estimator. To cache the transformers in the pipeline, memory arguments are used.

An estimator for classification is a Python object that implements the method's fit (x, y) and predict (T) values. An example of this is class sklearn.svm.SVC, which implements SVC. The model's parameters are taken as arguments for the estimator's constructor. The memory class in scikit-learn has the class sklearn.utils.Memory(*args, **kwargs) signature. This has methods to cache, clear, reduce, evaluate, and format the memory objects. The cache method is used to compute the return value of the function. The returned object is a MemorizedFunc object, which behaves like a function and offers additional methods for cache lookup and management. The cache method takes parameters such as func=None, ignore=None, verbose=None, and mmap_mode=False.

The class signature pipeline is as follows:

class sklearn.pipeline.Pipeline(steps, memory=None)

Let's take a look at another important component in the next section.

Table of Contents for The scikit-learn pipeline

Create new playlist

Sign In

Sign Up

Table of Contents for
The scikit-learn pipeline