Image segmentation is the process of separating an image into its constituent parts. It is an important step in many computer vision applications in the real world. There are many different ways of segmenting an image. When we segment an image, we separate the regions based on various metrics such as color, texture, location, and so on. All the pixels within each region have something in common, depending on the metric we are using. Let's take a look at some of the popular approaches here.
To start with, we will be looking at a technique called GrabCut. It is an image segmentation method based on a more generic approach called graph-cuts. In the graph-cuts method, we consider the entire image to be a graph, and then we segment the graph based on the strength of the edges in that graph. We construct the graph by considering each pixel to be a node and edges are constructed between the nodes, where edge weight is a function of the pixel values of those two nodes. Whenever there is a boundary, the pixel values are higher. Hence, the edge weights will also be higher. This graph is then segmented by minimizing the Gibss energy of the graph. This is analogous to finding the maximum entropy segmentation. You can refer to the original paper to learn more about it at http://cvg.ethz.ch/teaching/cvl/2012/grabcut-siggraph04.pdf. Let's consider the following image:
Let's select the region of interest:
Once the image has been segmented, it will look something like this:
Following is the code to do this:
import cv2 import numpy as np # Draw rectangle based on the input selection def draw_rectangle(event, x, y, flags, params): global x_init, y_init, drawing, top_left_pt, bottom_right_pt, img_orig # Detecting mouse button down event if event == cv2.EVENT_LBUTTONDOWN: drawing = True x_init, y_init = x, y # Detecting mouse movement elif event == cv2.EVENT_MOUSEMOVE: if drawing: top_left_pt, bottom_right_pt = (x_init,y_init), (x,y) img[y_init:y, x_init:x] = 255 - img_orig[y_init:y, x_init:x] cv2.rectangle(img, top_left_pt, bottom_right_pt, (0,255,0), 2) # Detecting mouse button up event elif event == cv2.EVENT_LBUTTONUP: drawing = False top_left_pt, bottom_right_pt = (x_init,y_init), (x,y) img[y_init:y, x_init:x] = 255 - img[y_init:y, x_init:x] cv2.rectangle(img, top_left_pt, bottom_right_pt, (0,255,0), 2) rect_final = (x_init, y_init, x-x_init, y-y_init) # Run Grabcut on the region of interest run_grabcut(img_orig, rect_final) # Grabcut algorithm def run_grabcut(img_orig, rect_final): # Initialize the mask mask = np.zeros(img_orig.shape[:2],np.uint8) # Extract the rectangle and set the region of # interest in the above mask x,y,w,h = rect_final mask[y:y+h, x:x+w] = 1 # Initialize background and foreground models bgdModel = np.zeros((1,65), np.float64) fgdModel = np.zeros((1,65), np.float64) # Run Grabcut algorithm cv2.grabCut(img_orig, mask, rect_final, bgdModel, fgdModel, 5, cv2.GC_INIT_WITH_RECT) # Extract new mask mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8') # Apply the above mask to the image img_orig = img_orig*mask2[:,:,np.newaxis] # Display the image cv2.imshow('Output', img_orig) if __name__=='__main__': drawing = False top_left_pt, bottom_right_pt = (-1,-1), (-1,-1) # Read the input image img_orig = cv2.imread(sys.argv[1]) img = img_orig.copy() cv2.namedWindow('Input') cv2.setMouseCallback('Input', draw_rectangle) while True: cv2.imshow('Input', img) c = cv2.waitKey(1) if c == 27: break cv2.destroyAllWindows()
We start with the seed points specified by the user. This is the bounding box within which we have the object of interest. Underneath the surface, the algorithm estimates the color distribution of the object and the background. The algorithm represents the color distribution of the image as a Gaussian Mixture Markov Random Field (GMMRF). You can refer to the detailed paper to learn more about GMMRF at http://research.microsoft.com/pubs/67898/eccv04-GMMRF.pdf. We need the color distribution of both, the object and the background, because we will be using this knowledge to separate the object. This information is used to find the maximum entropy segmentation by applying the min-cut algorithm to the Markov Random Field. Once we have this, we use the graph cuts optimization method to infer the labels.