Unsupervised learning

As we discussed in the last chapter, supervised learning is the machine learning process of leveraging a function that maps an input to an output based on example input-output pairs, inferring a function from labeled training data comprising a set of training samples.

Again, in the last chapter, we saw how, when using the model builder, we could set a label column for a predictive model to predict. Recall that, in one example, we chose the column IS_TENT from within the training data for the model to predict.

Now, in this section of this chapter, we want to examine scenarios where we have no label data defined in our data, or in other words, unsupervised learning problems. To reiterate, in these cases, we have no feedback (or label) based on the prior prediction results available; we expect to solve these cases without indicating or setting a desired label.

To further understand what unsupervised learning really is, you can head on to the following link: https://www.datasciencecentral.com/profiles/blogs/what-is-unsupervised-learning.

Why not always use supervised learning (and labeled data)? To understand why you might find yourself using an unsupervised learning model, consider the fact that it is usually easier to find unlabeled data (it's cheaper), and polishing unlabeled data and adding labels typically requires subject matter experts and can be a complex process in itself.

One way to accomplish the goal of unsupervised learning is through the use of a clustering algorithm. Clustering uses only data to determine patterns, anomalies (more on anomalies in a later section of this chapter), or similarities in the data.

Clustering organizes data by identifying data that is similar within different clusters as well as data that isn’t similar across clusters.

Clustering is popular within the field of statistical data analysis as different clusters expose different details about the objects within data, which is different from classification or regression, where you have some previous information on the results.

A popular type of clustering algorithm is the K-means clustering algorithm.  This algorithm is used to classify or to group objects based on attributes or features into K number of groups (indicating how the methodology got its name).

In this method, K will be a positive integer number and is simply the number of clusters or distinct groups the data is classified into, without the use of a labeled or target field. K-means tries to uncover patterns in the set of input fields within data rather than predicting an outcome.

In the next section of this chapter, we will look at a working example showing the use of the K-means algorithm to create clusters from data in Watson Studio in an effort to produce a prediction, without having knowledge of what the predictor(s) may be.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset