Siamese networks

Siamese networks are special types of neural networks and are among the simplest and most popularly used one-shot learning algorithms. As we have learned in the previous section, one-shot learning is a technique where we learn from only one training example per each class. So, siamese networks are predominantly used in applications where we don't have many data points for each of the class.

For instance, let's say we want to build a face recognition model for our organization and say about 500 people are working in our organization. If we want to build our face recognition model using a convolutional neural network (CNN) from scratch then we need many images of all these 500 people, to train the network and attain good accuracy. But, apparently, we will not have many images for all these 500 people and therefore it is not feasible to build a model using a CNN or any deep learning algorithm unless we have sufficient data points. So, in these kinds of scenarios, we can resort to a sophisticated one-shot learning algorithm such as a siamese network, which can learn from fewer data points.

But how do siamese networks work? Siamese networks basically consist of two symmetrical neural networks both sharing the same weights and architecture and both joined together at the end using an energy function, . The objective of our siamese network is to learn whether the two inputs are similar or dissimilar.

Let's say we have two images, and , and we want to learn whether the two images are similar or dissimilar. As shown in the following diagram, we feed Image to Network and Image to Network . The role of both of these networks is to generate embeddings (feature vectors) for the input image. So, we can use any network that will give us embeddings. Since our input is an image, we can use a convolutional network to generate the embeddings: that is, for extracting features. Remember that the role of the CNN here is only to extract features and not to classify.

As we know that these networks should have same weights and architecture, if Network is a three-layer CNN then Network should also be a three-layer CNN, and we have to use the same set of weights for both of these networks. So, Network and Network will give us the embeddings for input images and respectively. Then, we will feed these embeddings to the energy function, which tells us how similar the two input images are. Energy functions are basically any similarity measure, such as Euclidean distance and cosine similarity:

Siamese networks are not only used for face recognition, but are is also used extensively in applications where we don't have many data points and tasks where we need to learn the similarity between two inputs. The applications of siamese networks include signature verification, similar question retrieval, and object tracking. We will study siamese networks in detail in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset