Let's say you are dealing with images and you want to block out a particular shape. Now, you might say that you will use shape matching to identify the shape and then just block it out, right? But the problem here is that we don't have any template available. So, how do we go about doing this? Shape analysis comes in various forms, and we need to build our algorithm depending on the situation. Let's consider the following figure:
Let's say we want to identify all the boomerang shapes and then block them out without using any template images. As you can see, there are various other weird shapes in that image and the boomerang shapes are not really smooth. We need to identify the property that's going to differentiate the boomerang shape from the other shapes present. Let's consider the convex hull. If you take the ratio of the area of each shape to the area of the convex hull, we can see that this can be a distinguishing metric. This metric is called solidity factor in shape analysis. This metric will have a lower value for the boomerang shapes because of the empty area that will be left out, as shown in the following figure:
The black boundaries represent the convex hulls. Once we compute these values for all the shapes, how do separate them out? Can we just use a fixed threshold to detect the boomerang shapes? Not really! We cannot have a fixed threshold value because you never know what kind of shape you might encounter later. So, a better approach would be to use K-Means clustering. K-Means is an unsupervised learning technique that can be used to separate out the input data into K classes. You can quickly brush up on K-Means before proceeding further at http://docs.opencv.org/master/de/d4d/tutorial_py_kmeans_understanding.html.
We know that we want to separate the shapes into two groups, that is, boomerang shapes and other shapes. So, we know what our K will be in K-Means. Once we use that and cluster the values, we pick the cluster with the lowest solidity factor and that will give us our boomerang shapes. Bear in mind that this approach works only in this particular case. If you are dealing with other kinds of shapes, then you will have to use some other metrics to make sure that the shape detection works. As we discussed earlier, it depends heavily on the situation. If you detect the shapes and block them out, it will look like this:
Following is the code to do it:
import sys import cv2 import numpy as np def get_all_contours(img): ref_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ret, thresh = cv2.threshold(ref_gray, 127, 255, 0) contours, hierarchy = cv2.findContours(thresh, 1, 2) return contours if __name__=='__main__': # Input image containing all the shapes img = cv2.imread(sys.argv[1]) img_orig = np.copy(img) input_contours = get_all_contours(img) solidity_values = [] # Compute solidity factors of all the contours for contour in input_contours: area_contour = cv2.contourArea(contour) convex_hull = cv2.convexHull(contour) area_hull = cv2.contourArea(convex_hull) solidity = float(area_contour)/area_hull solidity_values.append(solidity) # Clustering using KMeans criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0) flags = cv2.KMEANS_RANDOM_CENTERS solidity_values = np.array(solidity_values).reshape((len(solidity_values),1)).astype('float32') compactness, labels, centers = cv2.kmeans(solidity_values, 2, criteria, 10, flags) closest_class = np.argmin(centers) output_contours = [] for i in solidity_values[labels==closest_class]: index = np.where(solidity_values==i)[0][0] output_contours.append(input_contours[index]) cv2.drawContours(img, output_contours, -1, (0,0,0), 3) cv2.imshow('Output', img) # Censoring for contour in output_contours: rect = cv2.minAreaRect(contour) box = cv2.cv.BoxPoints(rect) box = np.int0(box) cv2.drawContours(img_orig,[box],0,(0,0,0),-1) cv2.imshow('Censored', img_orig) cv2.waitKey()