In this chapter we discussed generalized linear models, which extend ordinary linear regression to support response variables with non-normal distributions. Generalized linear models use a link function to relate a linear combination of the explanatory variables to the response variable; unlike ordinary linear regression, the relationship does not need to be linear. In particular, we examined the logistic link function, a sigmoid function that returns a value between zero and one for any real number.
We discussed logistic regression, a generalized linear model that uses the logistic link function to relate explanatory variables to a Bernoulli-distributed response variable. Logistic regression can be used for binary classification, a task in which an instance must be assigned to one of the two classes; we used logistic regression to classify spam and ham SMS messages. We then discussed multi-class classification, a task in which each instance must be assigned one label from a set of labels. We used the one-vs.-all strategy to classify the sentiments of movie reviews. Finally, we discussed multi-label classification, in which instances must be assigned a subset of a set of labels. Having completed our discussion of regression and classification with generalized linear models, we will introduce a non-linear model for regression and classification called the decision tree in the next chapter.