As described previously, the attention RNN is a simple, 1-layer GRU. It contains 256 units, as defined in the paper. Defining a function for it looks like overkill, but it allows for better readability, especially since we have already described the architecture with the terminology used in the paper:
def get_attention_RNN():
return GRU(256)