How Clustering Works in Tableau

Cluster analysis partitions the marks in the view into clusters, where the marks within each cluster are more similar to one another than they are to marks in other clusters. Tableau distinguishes clusters using color.

Tip

For additional insight into how clustering works in Tableau, see the blog post Understanding Clustering in Tableau 10 at https://boraberan.wordpress.com/2016/07/19/understanding-clustering-in-tableau-10/.

The clustering algorithm

Tableau uses the k-means algorithm for clustering. For a given number of clusters k, the algorithm partitions the data into k clusters. Each cluster has a center (centroid) that is the mean value of all the points in that cluster. The k-means locates centers through an iterative procedure that minimizes distances between individual points in a cluster and the cluster center. In Tableau, you can specify a desired number of clusters, or have Tableau test different values of k and suggest an optimal number of clusters (see Determining the optimal number o f clusters section at http://onlinehelp.tableau.com/v10.0/pro/desktop/en-us/clustering_howitworks.html#Determining_the_Optimal_Number_of_Clusters for further details).

K-means requires an initial specification of cluster centers. Starting with one cluster, the method chooses a variable whose mean is used as a threshold for splitting the data in two. The centroids of these two parts are then used to initialize k-means to optimize the membership of the two clusters. Next, one of the two clusters is chosen for splitting and a variable within that cluster is chosen whose mean is used as a threshold for splitting that cluster in two. K-means is then used to partition the data into three clusters, initialized with the centroids of the two parts of the split cluster and the centroid of the remaining cluster. This process is repeated until a set number of clusters is reached.

Tableau uses the Lloyd algorithm with squared Euclidean distances to compute the k-means clustering for each k. Combined with the splitting procedure to determine the initial centers for each k > 1, the resulting clustering is deterministic, with the result dependent only on the number of clusters.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset