The decoder RNN is a 2-layer GRU with vertical residual connections (as explained previously):
def get_decoder_RNN_output(input_data):
rnn1 = GRU(256, return_sequences=True)(input_data)
inp2 = Add()([input_data, rnn1])
rnn2 = GRU(256)(inp2)
decoder_rnn = Add()([inp2, rnn2])
return decoder_rnn
Note that we have to use return_sequences=True when we define the first GRU layer. That way, for each input timestep, an output will be returned, so that, given a sequence as input, a sequence is output by the first GRU. If we don't do so, the first GRU returns only one output for the entire input sequence, while the second GRU expects a sequence as input.