We start with a query: where is the milk now? It is encoded with bags of words using a vector of size V. In the simplest case, we use embedding B (d x V) to convert the vector to a word embedding of size d. We have u=embeddingB(q):
The input sentences x1, x2, ... , and xi are stored in memory by using another embedding matrix, A (d x Vd x V), with the same dimension as B mi=embeddingA(xi). The similarity between each embedded query, u, and each memory, mi, is computed by taking the inner product followed by a softmax: pi = softmax(uTmi).
The output memory representation is as follows: each xi has a corresponding output vector, ci, which can be represented by another embedding matrix, C. The response vector from the memory, o, is then a sum over the ci, weighted by the probability vector from the input:
Finally, the sum of o and u is multiplied with a weight matrix, W (V x d). The result is passed to a softmax function to predict the final answer:
A TensorFlow implementation of MemN2N is available here: https://github.com/carpedm20/MemN2N-tensorflow.