Attention model: how to calculate the hidden state s?

Hi everyone,

In the video Attention Model, it was explained that the hidden state of the post-attention LSTM, denoted as s^t, is influenced by the previous hidden state s^{t-1}, the context vector c^t, and the previous output y^{t-1}. Could someone explain the exact formula or steps used to calculate s^t? Is it the same as a^{t} in the second screenshot from week 1?

For a standard LSTM, without attention, the a^t corresponds to the hidden state after applying the output gate. In the case of the attention model, s^t in the post-attention LSTM follows a similar structure but now has additional input from the attention context vector c^t . Thus, s^t is essentially equivalent to a^t , but with the context of the attention mechanism included. In this respect, s^t in the post-attention LSTM behaves similarly to a^t in a regular LSTM, except that it incorporates the context vector c^t along with the previous hidden state s^{t-1} and previous output y^{t-1} .

In the case of the attention model, the LSTM cell takes the context vector c^t as an additional input. So the input to the LSTM cell becomes [ y^{t-1}, c^t ] instead of just x^t , where y^{t-1} is the previous output. The steps remain similar:

  1. Candidate Cell State \tilde{c}^t :

    \tilde{c}^t = \tanh(W_c [ s^{t-1}, c^t, y^{t-1} ] + b_c)

  2. Update Gate \Gamma_u :

    \Gamma_u = \sigma(W_u [ s^{t-1}, c^t, y^{t-1} ] + b_u)

  3. Forget Gate \Gamma_f :

    \Gamma_f = \sigma(W_f [ s^{t-1}, c^t, y^{t-1} ] + b_f)

  4. Output Gate \Gamma_o :

    \Gamma_o = \sigma(W_o [ s^{t-1}, c^t, y^{t-1} ] + b_o)

  5. Cell State Update c^t :

    c^t = \Gamma_f \cdot c^{t-1} + \Gamma_u \cdot \tilde{c}^t

  6. Hidden State s^t :

    s^t = \Gamma_o \cdot \tanh(c^t)

2 Likes

Thank you for your thorough explanation! This helps me connect the dots.

1 Like