Two tiny questions in W3A1

shaerlmv · December 26, 2023, 2:56pm

Q1: In 2.1 - Attention Mechanism, is 𝑎⟨𝑡’⟩�⟨�⟩ instead of 𝑎⟨𝑡⟩�⟨�⟩ more reasonable?

Q2: In Exercise 2 - modelf (11th code cell) since we can define a layer using just 1 line(step 1) a = Bidirectional(LSTM(units = n_a, return_sequences = True))(X), why we still have to implment a for loop in step 2? Did I miss sth?

saifkhanengr · December 26, 2023, 3:59pm

I don’t understand your first question but the answer to your second question is in the notebook.

Ty -- length of the output sequence
.
.
.
# Step 2: Iterate for Ty steps

paulinpaloalto · December 26, 2023, 4:25pm

Right! The other thing to note is that there are two LSTMs involved here, right? The bidirectional one is the pre-attention one and it happens outside the for loop. The loop is for the post-attention model which is not bidirectional, right?

I’m also having trouble with question 1, but I think the distinction they are making there between a^{<t>} and a^{<t'>} is that the attention mechanism looks at the a values from all the timesteps of the bidirectional “pre-attention” LSTM when generating the attention for a particular timestep of the post-attention LSTM. Where the “attention” goes at each of the final timesteps may need to be different, right? The point is that is actually being learned during training. But this stuff is pretty complicated and I need to go back and rewatch the video they reference there to make sure I’ve got that clear in my mind.

shaerlmv · December 27, 2023, 2:38am

Thanks for replying.
In Q1 I mean in the highlight line of snap 𝑎⟨𝑡⟩ should be 𝑎⟨𝑡’⟩.
Because I see in the lecture note C5_W3.pdf it is 𝑎⟨𝑡’⟩

shaerlmv · December 27, 2023, 2:57am

Thanks for replying!
What I mean is can we create the post-attention layer use only one line like the bidirectional layer?
If we can’t , is it because in post-attention layer we have to use the last step’s result and write it explicitly(since keras doesn’t offer a function wrap these operations)?

Topic		Replies	Views
Pre and Post-attention LSTM cells in Week 3 Assignment 1 Sequence Models	2	537	January 14, 2023
Week 3, Course 5, Programming Assignment 1 Sequence Models	13	1077	January 5, 2023
C5W3 modelf Sequence Models	7	354	November 3, 2023
Course 5 Week 3 Assignment 1 modelf Sequence Models	34	1015	September 1, 2024
Week 3, Neural_machine_translation_with_attention Sequence Models	2	579	July 3, 2021

Two tiny questions in W3A1

Related topics