Attention model: how to calculate the hidden state s?

Zijun_Liu · September 30, 2024, 5:46pm

Hi everyone,

In the video Attention Model, it was explained that the hidden state of the post-attention LSTM, denoted as s^t, is influenced by the previous hidden state s^{t-1}, the context vector c^t, and the previous output y^{t-1}. Could someone explain the exact formula or steps used to calculate s^t? Is it the same as a^{t} in the second screenshot from week 1?

nadtriana · September 30, 2024, 6:28pm

For a standard LSTM, without attention, the a^t corresponds to the hidden state after applying the output gate. In the case of the attention model, s^t in the post-attention LSTM follows a similar structure but now has additional input from the attention context vector c^t . Thus, s^t is essentially equivalent to a^t , but with the context of the attention mechanism included. In this respect, s^t in the post-attention LSTM behaves similarly to a^t in a regular LSTM, except that it incorporates the context vector c^t along with the previous hidden state s^{t-1} and previous output y^{t-1} .

In the case of the attention model, the LSTM cell takes the context vector c^t as an additional input. So the input to the LSTM cell becomes [ y^{t-1}, c^t ] instead of just x^t , where y^{t-1} is the previous output. The steps remain similar:

Candidate Cell State \tilde{c}^t :

\tilde{c}^t = \tanh(W_c [ s^{t-1}, c^t, y^{t-1} ] + b_c)
Update Gate \Gamma_u :

\Gamma_u = \sigma(W_u [ s^{t-1}, c^t, y^{t-1} ] + b_u)
Forget Gate \Gamma_f :

\Gamma_f = \sigma(W_f [ s^{t-1}, c^t, y^{t-1} ] + b_f)
Output Gate \Gamma_o :

\Gamma_o = \sigma(W_o [ s^{t-1}, c^t, y^{t-1} ] + b_o)
Cell State Update c^t :

c^t = \Gamma_f \cdot c^{t-1} + \Gamma_u \cdot \tilde{c}^t
Hidden State s^t :

s^t = \Gamma_o \cdot \tanh(c^t)

Zijun_Liu · September 30, 2024, 6:51pm

Thank you for your thorough explanation! This helps me connect the dots.

Topic		Replies	Views
Course 5 Week 3 Assignment 1 modelf Sequence Models coursera-platform	34	1061	September 1, 2024
Week 3 Assignment 1 Neural_machine_translation_with_attention Sequence Models coursera-platform	1	661	April 21, 2022
Week 3, Course 5, Programming Assignment 1 Sequence Models coursera-platform	13	1093	January 5, 2023
Week 4, Visualizing LSTM Models Sequence Models coursera-platform	1	553	January 3, 2022
C5 W3A1 the role of RepeatVector Sequence Models coursera-platform	1	574	May 22, 2022

Attention model: how to calculate the hidden state s?

Related topics