Bidirectional RNN - Time Step Outputs

hungng777 · June 9, 2024, 1:46am

Assume the input sentence is “I love mango juice” for the bidirectional RNN. At time step 1, the RNN in the forward direction receives “I” as the input while the RNN in the backward direction receives “juice” as the input. Is the output of time step 1 the combined output (e.g. concatenated output) of the forward and backward RNNs?

At time step 2, the RNN in the forward direction receives “Iove” as the input while the RNN in the backward direction receives “mango” as the input. Is the output of time step 2 also the combined output (e.g. concatenated output) of the forward and backward RNNs?

Thanks in advance.

balaji.ambresh · June 9, 2024, 4:06am

Output at timestep 1 is the concat of “I” from both the forward and backward passes. See this link to learn about the merge_mode parameter.

Hope this clears any doubts on why the entire sequence should be known for this bidirectional layer to be effective.

hungng777 · June 9, 2024, 4:40am

Why output at timestep 1 is the concat of “I” from both the forward and backward passes? I thought the backward RNN at time step 1 received “juice” as the input.

balaji.ambresh · June 9, 2024, 6:32am

Bidirectional RNN aims to learn about a token from both directions:

Forward direction: starting from start of sentence and ending at current token
Backward direction: starting from the end of sentence ending at current token.

hungng777 · June 9, 2024, 6:38pm

I don’t quite get your explanation. Could you explain in terms of what happen at time step 1, time step 2 and etc.?

Thanks.

balaji.ambresh · June 10, 2024, 3:47am

Why don’t you go through these links and answer your question?

Read Bi-Directional Recurrent Neural Network section of the medium article.
Keras source code

paulinpaloalto · June 10, 2024, 10:50pm

Other mentors have been following this conversation. I don’t personally know the answer right at the moment. The reason is that we never actually build a bidirectional network by hand, in the way of the “Step by Step” exercise in Week 1. When we get around to using them, we just use the TF/Keras Bidirectional module, which is a “black box” and our choices are to read either the documentation that Balaji already gave us or (gulp) the source code he also linked for us.

But my next step is to watch the lecture that Prof Ng gives specifically on this topic. That will be our best hope of getting to a better level of understanding. I watched it back in 2019 when I first took this course, but the memory is not current any more. In my written notes, it just shows the diagram, but doesn’t have enough detail. I’m assuming you’ve watched the lecture already, but it might be worth watching again. My personal life is quite busy in the next 6 to 8 hours and then it’ll be bedtime where I am. So it’s not likely I will have time to watch the lecture in the next 18 hours or so.

But there may well be other folks who see this and have more information. Maybe we get lucky and one or more of them will chime in before I get time to watch the lecture again.

Nevermnd · June 11, 2024, 2:55am

@hungng777 Ha. I don’t want to be too simple here and am maybe not the smartest one here to help.

But to go alll the way back to your original question the answer is simply: ‘Yes’.

Where I think you might be getting (understandably) confused:

Your data point (X) order never changes, nor do your time steps (T)-- I.e. They are always sequential (X^{<1>}, X^{<2>}, etc and your timesteps T^{<1>}, T^{<2>}, etc)-- Yet in your reverse transversal, it is only your network that runs in reverse order; I.e. You feed it X^{<4>}, X^{<3>}, etc.

Like in “I love mango juice”, it gets too confusing to think of ‘juice’ as now X^{<1>}. Don’t do that. It is still X^{<4>}, you’ve only changed your data order.

Hope that helps.

*Also, obviously, you have to complete the entire back and forth path before you can concatenate.

hungng777 · June 11, 2024, 6:23am

Thank you for the explanation. I just want to know more about bidirectional RNN and its variants (e.g. bidirectional LSTM) since they are the basic building blocks of more advanced constructs (e.g. attention model).

paulinpaloalto · June 14, 2024, 9:50pm

I just watched this lecture again. Sorry it took me a few days. I think Anthony’s answer covers the fundamental question, but if you use this screen shot slightly later in the video:

It gives a more complete picture of how the \hat{y}^{<n>} values are computed. You can see that the W_y weight matrices are used to process both the forward and backward states. At each timestep, you have two separate “cell state” values: one for the forward direction and one for the reverse direction. These are separate from each other. What will be contained in those states is determined by whether you select a plain RNN architecture, a GRU or LSTM architecture. But with whichever architecture you have chosen, you do the “forward prop” in both time directions. Then you’d do back prop to learn the weight values based on the comparison of the \hat{y}^{<n>} values and the y^{<n>} at all the timesteps. Of course that will also drive updates to the other weight and bias values that are used to compute the updated cell states in both directions. Those weights will be separate between the directions, because the states are distinct.

Topic		Replies	Views
Bidirectional RNN Sequence Models coursera-platform	4	759	May 18, 2025
Bi-directional RNNs NLP with Sequence Models week-2	3	374	August 22, 2023
The input of bi-directional layer Sequence Models coursera-platform	8	583	June 25, 2021
W1 lecture on Bi-directional RNN NLP with Sequence Models week-1	3	278	March 10, 2024
Bidirectional layer for time series forecasting Sequences, Time Series and Prediction week-3	3	494	July 21, 2023

Bidirectional RNN - Time Step Outputs

Related topics