Binary classification

Binary (also referred to as binomial) classification is the process of classifying the elements of a given set into two groups on the basis of a classification rule. The product documentation offers a great example exercise that you can use to understand when binary classification is perhaps the best choice for your model. The example is training a model to predict whether or not a customer is likely to buy a tent from an outdoor equipment store, given the training data sample. If you go ahead and download the sample data and then examine the columns, you can understand how binary classification works. Let's analyze the exercise.

We want to build a model that will predict whether a given customer is likely to purchase a particular product; in this case, a tent. Suppose we again use the model builder to create a new model, load the sample data provided, and set the basic model details.

The process is as follows:

  1. Define a label column. In this example, the choice is IS_TENT. This column indicates whether or not the customer bought a tent.
  2. Define the feature columns. Feature columns are columns in the data that contain the traits on which the machine learning model will base its predictions. In this historical data, there are the following four feature columns:
  • GENDER: Customer gender
  • AGE: Customer age
  • MARITAL_STATUS: Married, Single, or Unspecified
  • PROFESSION: General category of the customer's profession, such as Hospitality or Sales, or simply Other

  1. Set the build type to Automatic (this will cause model builder to automatically select an algorithm to implement the machine learning technique you specify).
  2. Click on Create and add the training data.

To train the model, you will specify the preceding label and feature columns and then pick the machine learning technique: binary classification. After the model is saved, the model details page will open automatically. To see which algorithm the model builder used, you can go to the Summary table in the Overview information on the model details page (shown in the following screenshot) and click on View in the Model builder details row:

This will reveal the following details for review:

You can see that the model builder, after we picked binary classification, chose LogisticRegression as its best choice for an estimator. Logistic regression is the usual choice when the dependent variable is dichotomous (binary). In our case, the dependent variable is our label column, IS_TENT.

The conclusion, considering our data and objective, is that binary classification was picked as the technique because we want to classify the data into defined categories (think about how the records in the training data could be grouped, for example: males, married, who work in sales; males, single, who work as a professional; and so on). The estimator used (logistic regression) was chosen since, again, the dependent variable is binary (will purchase or will not purchase).

Aligning the sample data values with the chosen technique and estimator, the reasoning behind these choices should begin to make sense. It is a good idea to continue to experiment with data and the model builder to gain further comfort with these concepts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset