This chapter takes the user on a whirlwind tour of machine learning, focusing on using the pandas library as a tool that can be used to preprocess data used by machine learning programs. It also introduces the user to the scikit-learn
library, which is the most popular machine learning toolkit in Python.
In this chapter, we illustrate machine learning techniques by applying them to a well-known problem about classifying which passengers survived the Titanic disaster at the turn of the last century. The various topics addressed in this chapter include the following:
scikit-learn
scikit-learn
ML classifier interfaceThe library we will be considering for machine learning is called scikit-learn
. The scikit-learn
Python library provides an extensive library of machine learning algorithms that can be used to create adaptive programs that learn from data inputs.
However, before this data can be used by scikit-learn
, it must undergo some preprocessing. This is where pandas comes in. The pandas can be used to preprocess and filter data before passing it to the algorithm implemented in scikit-learn
.