The main purpose of the scikit-learn pipeline is to assemble ML steps. This can be cross-validated to set various parameters. Scikit-learn provides a library of transformers that are used for preprocessing data (data cleaning), kernel approximation (expand), unsupervised dimensionality reduction (reduce), and feature extraction (generate). The pipeline contains a series of transformers with a final estimator.
The pipeline sequentially applies a list of transforms, followed by a final estimator. In the pipeline, the fit and transform methods are implemented during the intermediate steps. The fit method is implemented only at the end of pipeline operation by the final estimator. To cache the transformers in the pipeline, memory arguments are used.
The class signature pipeline is as follows:
class sklearn.pipeline.Pipeline(steps, memory=None)
Let's take a look at another important component in the next section.