3

I am trying to follow an implementation of an attention decoder

from keras.layers.recurrent import Recurrent
...
class AttentionDecoder(Recurrent):
... 
    #######################################################
    # The functionality of init, build and call is clear 
    #######################################################

    def __init__(self, units, output_dim,
         activation='tanh',
         ...

    def build(self, input_shape):
             ...

    def call(self, x):
        self.x_seq = x
        ...
        return super(AttentionDecoder, self).call(x)
    

    ##################################################################
    # What is the purpose of 'get_initial_state' and 'step' functions
    # Do these functions override the Recurrent base class functions?   
    ##################################################################

    def get_initial_state(self, inputs):
        
        # apply the matrix on the first time step to get the initial s0.
        s0 = activations.tanh(K.dot(inputs[:, 0], self.W_s))

        # from keras.layers.recurrent to initialize a vector of (batchsize,
        # output_dim)
        y0 = K.zeros_like(inputs)  # (samples, timesteps, input_dims)
        y0 = K.sum(y0, axis=(1, 2))  # (samples, )
        y0 = K.expand_dims(y0)  # (samples, 1)
        y0 = K.tile(y0, [1, self.output_dim])

        return [y0, s0]

    def step(self, x, states):

        ytm, stm = states
        ...
        return yt, [yt, st]

The AttentionDecoder class is inherited from Recurrent, an abstract base class for recurrent layers ( [email protected], documented here ).

How do the get_initial_state and step function work withing the class (who calls them, when, etc.)? If these function are related to the base class Recurrent, where can I find the relevant documentation?

0