Chapter 4. From Linear Regression to Logistic Regression

In Chapter 2, Linear Regression, we discussed simple linear regression, multiple linear regression, and polynomial regression. These models are special cases of the generalized linear model, a flexible framework that requires fewer assumptions than ordinary linear regression. In this chapter, we will discuss some of these assumptions as they relate to another special case of the generalized linear model called logistic regression.

Unlike the models we discussed previously, logistic regression is used for classification tasks. Recall that the goal in classification tasks is to find a function that maps an observation to its associated class or label. A learning algorithm must use pairs of feature vectors and their corresponding labels to induce the values of the mapping function's parameters that produce the best classifier, as measured by a particular performance metric. In binary classification, the classifier must assign instances to one of the two classes. Examples of binary classification include predicting whether or not a patient has a particular disease, whether or not an audio sample contains human speech, or whether or not the Duke men's basketball team will lose in the first round of the NCAA tournament. In multiclass classification, the classifier must assign one of several labels to each instance. In multilabel classification, the classifier must assign a subset of the labels to each instance. In this chapter, we will work through several classification problems using logistic regression, discuss performance measures for the classification task, and apply some of the feature extraction techniques you learned in the previous chapter.

Binary classification with logistic regression

Ordinary linear regression assumes that the response variable is normally distributed. The normal distribution, also known as the Gaussian distribution or bell curve, is a function that describes the probability that an observation will have a value between any two real numbers. Normally distributed data is symmetrical. That is, half of the values are greater than the mean and the other half of the values are less than the mean. The mean, median, and mode of normally distributed data are also equal. Many natural phenomena approximately follow normal distributions. For instance, the height of people is normally distributed; most people are of average height, a few are tall, and a few are short.

In some problems the response variable is not normally distributed. For instance, a coin toss can result in two outcomes: heads or tails. The Bernoulli distribution describes the probability distribution of a random variable that can take the positive case with probability P or the negative case with probability 1-P. If the response variable represents a probability, it must be constrained to the range {0,1}. Linear regression assumes that a constant change in the value of an explanatory variable results in a constant change in the value of the response variable, an assumption that does not hold if the value of the response variable represents a probability. Generalized linear models remove this assumption by relating a linear combination of the explanatory variables to the response variable using a link function. In fact, we already used a link function in Chapter 2, Linear Regression; ordinary linear regression is a special case of the generalized linear model that relates a linear combination of the explanatory variables to a normally distributed response variable using the identity link function. We can use a different link function to relate a linear combination of the explanatory variables to the response variable that is not normally distributed.

In logistic regression, the response variable describes the probability that the outcome is the positive case. If the response variable is equal to or exceeds a discrimination threshold, the positive class is predicted; otherwise, the negative class is predicted. The response variable is modeled as a function of a linear combination of the explanatory variables using the logistic function. Given by the following equation, the logistic function always returns a value between zero and one:

Binary classification with logistic regression

The following is a plot of the value of the logistic function for the range {-6,6}:

Binary classification with logistic regression

For logistic regression, Binary classification with logistic regression is equal to a linear combination of explanatory variables, as follows:

Binary classification with logistic regression

The logit function is the inverse of the logistic function. It links Binary classification with logistic regression back to a linear combination of the explanatory variables:

Binary classification with logistic regression

Now that we have defined the model for logistic regression, let's apply it to a binary classification task.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset