Relation networks

Relation networks consist of two important functions: an embedding function, denoted by and the relation function, denoted by . The embedding function is used for extracting the features from the input. If our input is an image, then we can use a convolutional network as our embedding function, which will give us the feature vectors/embeddings of an image. If our input is text, then we can use LSTM networks to get the embeddings of the text. Let us say, we have a support set containing three classes, {lion, elephant, dog} as shown below:

And let's say we have a query image , as shown in the following diagram, and we want to predict the class of this query image:

First, we take each image, , from the support set and pass it to the embedding function for extract the features. Since our support set has images, we can use a convolutional network as our embedding function for learning the embeddings. The embedding function will give us the feature vector of each of the data points in the support set. Similarly, we will learn the embeddings of our query image by passing it to the embedding function .

Once we have the feature vectors of the support set and query set , we combine them using some operator . Here, can be any combination operator. We use concatenation as an operator to combining the feature vectors of the support and query set:

As shown in the following diagram, we will combine the feature vectors of the support set, , and, query set, . But what is the use of combining like this? Well, it will help us to understand how the feature vector of an image in the support set is related to the feature vector of a query image.

In our example, it will help us to understand how the feature vector of a lion is related to the feature vector of a query image, how the feature vector of an elephant is related to the feature vector of a query image, and how the feature vector of dog is related to the feature vector of a query image:

But how can we measure this relatedness? Well, that is why we use a relation function . We pass these combined feature vectors to the relation function, which will generate the relation score ranging from 0 to 1, representing the similarity between samples in the support set and samples in the query set .

The following equation shows how we compute relation score in the relation network:

Here, denotes the relation score representing the similarity between each of the classes in the support set and the query image. Since we have three classes in the support set and one image in the query set, we will have three scores indicating how all the three classes in the support set are similar to the query image.

The overall representation of the relation network in a one-shot learning setting is shown in the following diagram:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset