Colorspace based tracker gives us the freedom to track a colored object, but we are also constrained to a predefined color. What if we just want to pick an object at random? How do we build an object tracker that can learn the characteristics of the selected object and just track it automatically? This is where the CAMShift algorithm, which stands for Continuously Adaptive Meanshift, comes into the picture. It's basically an improved version of the Meanshift algorithm.
The concept of Meanshift is actually nice and simple. Let's say we select a region of interest and we want our object tracker to track that object. In that region, we select a bunch of points based on the color histogram and compute the centroid. If the centroid lies at the center of this region, we know that the object hasn't moved. But if the centroid is not at the center of this region, then we know that the object is moving in some direction. The movement of the centroid controls the direction in which the object is moving. So, we move our bounding box to a new location so that the new centroid becomes the center of this bounding box. Hence, this algorithm is called Meanshift, because the mean (i.e. the centroid) is shifting. This way, we keep ourselves updated with the current location of the object.
But the problem with Meanshift is that the size of the bounding box is not allowed to change. When you move the object away from the camera, the object will appear smaller to the human eye, but Meanshift will not take this into account. The size of the bounding box will remain the same throughout the tracking session. Hence, we need to use CAMShift. The advantage of CAMShift is that it can adapt the size of the bounding box to the size of the object. Along with that, it can also keep track of the orientation of the object.
Let's consider the following frame in which the object is highlighted in orange (the box in my hand):
Now that we have selected the object, the algorithm computes the histogram backprojection
and extracts all the information. Let's move the object and see how it's getting tracked:
Looks like the object is getting tracked fairly well. Let's change the orientation and see if the tracking is maintained:
As we can see, the bounding ellipse has changed its location as well as its orientation. Let's change the perspective of the object and see if it's still able to track it:
We are still good! The bounding ellipse has changed the aspect ratio to reflect the fact that the object looks skewed now (because of the perspective transformation).
Following is the code:
import sys import cv2 import numpy as np class ObjectTracker(object): def __init__(self): # Initialize the video capture object # 0 -> indicates that frame should be captured # from webcam self.cap = cv2.VideoCapture(0) # Capture the frame from the webcam ret, self.frame = self.cap.read() # Downsampling factor for the input frame self.scaling_factor = 0.5 self.frame = cv2.resize(self.frame, None, fx=self.scaling_factor, fy=self.scaling_factor, interpolation=cv2.INTER_AREA) cv2.namedWindow('Object Tracker') cv2.setMouseCallback('Object Tracker', self.mouse_event) self.selection = None self.drag_start = None self.tracking_state = 0 # Method to track mouse events def mouse_event(self, event, x, y, flags, param): x, y = np.int16([x, y]) # Detecting the mouse button down event if event == cv2.EVENT_LBUTTONDOWN: self.drag_start = (x, y) self.tracking_state = 0 if self.drag_start: if flags & cv2.EVENT_FLAG_LBUTTON: h, w = self.frame.shape[:2] xo, yo = self.drag_start x0, y0 = np.maximum(0, np.minimum([xo, yo], [x, y])) x1, y1 = np.minimum([w, h], np.maximum([xo, yo], [x, y])) self.selection = None if x1-x0 > 0 and y1-y0 > 0: self.selection = (x0, y0, x1, y1) else: self.drag_start = None if self.selection is not None: self.tracking_state = 1 # Method to start tracking the object def start_tracking(self): # Iterate until the user presses the Esc key while True: # Capture the frame from webcam ret, self.frame = self.cap.read() # Resize the input frame self.frame = cv2.resize(self.frame, None, fx=self.scaling_factor, fy=self.scaling_factor, interpolation=cv2.INTER_AREA) vis = self.frame.copy() # Convert to HSV colorspace hsv = cv2.cvtColor(self.frame, cv2.COLOR_BGR2HSV) # Create the mask based on predefined thresholds. mask = cv2.inRange(hsv, np.array((0., 60., 32.)), np.array((180., 255., 255.))) if self.selection: x0, y0, x1, y1 = self.selection self.track_window = (x0, y0, x1-x0, y1-y0) hsv_roi = hsv[y0:y1, x0:x1] mask_roi = mask[y0:y1, x0:x1] # Compute the histogram hist = cv2.calcHist( [hsv_roi], [0], mask_roi, [16], [0, 180] ) # Normalize and reshape the histogram cv2.normalize(hist, hist, 0, 255, cv2.NORM_MINMAX); self.hist = hist.reshape(-1) vis_roi = vis[y0:y1, x0:x1] cv2.bitwise_not(vis_roi, vis_roi) vis[mask == 0] = 0 if self.tracking_state == 1: self.selection = None # Compute the histogram back projection prob = cv2.calcBackProject([hsv], [0], self.hist, [0, 180], 1) prob &= mask term_crit = ( cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 1 ) # Apply CAMShift on 'prob' track_box, self.track_window = cv2.CamShift(prob, self.track_window, term_crit) # Draw an ellipse around the object cv2.ellipse(vis, track_box, (0, 255, 0), 2) cv2.imshow('Object Tracker', vis) c = cv2.waitKey(5) if c == 27: break cv2.destroyAllWindows() if __name__ == '__main__': ObjectTracker().start_tracking()