In this lesson, we will discuss supervised learning from the theoretical and practical perspective. In particular, we will revisit the linear regression model for regression analysis discussed in Lesson 1, From Data to Decisions – Getting Started with TensorFlow, using a real dataset. Then we will see how to develop Titanic survival predictive models using Logistic Regression (LR), Random Forests, and Support Vector Machines (SVMs).
In a nutshell, the following topics will be covered in this lesson:
Depending on the nature of the learning feedback available, the machine learning process is typically classified into three broad categories: supervised learning, unsupervised learning, and reinforcement learning—see figure 1. A predictive model based on supervised learning algorithms can make predictions based on a labelled dataset that map inputs to outputs aligning with the real world.
For example, a dataset for spam filtering usually contains spam messages as well as not-spam messages. Therefore, we could know which messages in the training set are spam and which are ham. Nevertheless, we might have the opportunity to use this information to train our model in order to classify new unseen messages:
The following figure shows the schematic diagram of supervised learning. After the algorithm has found the required patterns, those patterns can be used to make predictions for unlabeled test data:
Examples include classification and regression for solving supervised learning problems so that predictive models can be built for predictive analytics based on them. We will provide several examples of supervised learning like linear regression, logistic regression, random forest, decision trees, Naive Bayes, multilayer perceptron, and so on.
In this lesson, we will mainly focus on the supervised learning algorithms for predictive analytics. Let's start from the very simple linear regression algorithm.