One-to-many architecture

In a one-to-many architecture, a single input is mapped to multiple hidden states and multiple output values, which means RNN takes a single input and maps it to an output sequence. Although we have a single input value, we share the hidden states across time steps to predict the output. Unlike the previous one-to-one architecture, here, we only share the previous hidden states across time steps and not the previous outputs.

One such application of this architecture is image caption generation. We pass a single image as an input, and the output is the sequence of words constituting a caption of the image.

As shown in the following figure, a single image is passed as an input to the RNN, and at the first time step, , the word Horse is predicted; on the next time step, , the previous hidden state is used to predict the next word which is standing. Similarly, it continues for a sequence of steps and predicts the next word until the caption is generated:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset