Comparison between logistic regression and decision trees

Before we dive into the coding details of decision trees, here, we will quickly compare the differences between logistic regression and decision trees, so that we will know which model is better and in what way.

Logistic regression

Decision trees

Logistic regression model looks like an equation between independent variables with respect to its dependent variable.

Tree classifiers produce rules in simple English sentences, which can be easily explained to senior management.

Logistic regression is a parametric model, in which the model is defined by having parameters multiplied by independent variables to predict the dependent variable.

Decision Trees are a non-parametric model, in which no pre-assumed parameter exists. Implicitly performs variable screening or feature selection.

Assumptions are made on response (or dependent) variable, with binomial or Bernoulli distribution.

No assumptions are made on the underlying distribution of the data.

Shape of the model is predefined (logistic curve).

Shape of the model is not predefined; model fits in best possible classification based on the data instead.

Provides very good results when independent variables are continuous in nature, and also linearity holds true.

Provides best results when most of the variables are categorical in nature.

Difficult to find complex interactions among variables (non-linear relationships between variables).

Non-linear relationships between parameters do not affect tree performance. Often uncover complex interactions. Trees can handle numerical data with highly skewed or multi-modal, as well as categorical predictors with either ordinal or non-ordinal structure.

Outliers and missing values deteriorate the performance of logistic regression.

Outliners and missing values are dealt with grace in decision trees.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset