The input of bi-directional layer

Meir · June 24, 2021, 2:53pm

The example code given in Assignment 1 of Week 3 is:
sequence_of_hidden_states = Bidirectional(LSTM(units=..., return_sequences=...))(the_input_X)

It seems to suggest that only X is given as input to Bidirectional. Why shouldn’t we pass to it the initial hidden and cell states as well?

TMosh · June 24, 2021, 3:23pm

The code for that is further down in the function.

Meir · June 24, 2021, 3:32pm

You probably mean that we pass those to the model, right? But why not to Bidirectional as well?

TMosh · June 24, 2021, 4:58pm

Frankly, I do not know. TF is large and mysterious, and the syntax isn’t always clear.

Meir · June 24, 2021, 7:45pm

Could you please send this questions to other mentors, so we can hopefully get clarity on this issue?

TMosh · June 24, 2021, 8:26pm

Done. Hopefully someone will stop by.

laacdm · June 24, 2021, 8:55pm

The size of the entire sequence of hidden states is the arguments of units in the LSTM. The hidden states a from the initial a1 to the final aTx are predictions of the LSTM along with the output y. This fact is independent of wether the LSTM is uni-directional or bi-directional.

In short for the estimation of an LSTM (uni or bi-directional) you only need the inputs. The parameters are estimated and the states and outputs are predicted through the optimization algorithm (usually an improved version of gradient descent).

I hope this explanation helps if there are still any doubts don’t hesitate to question.

Meir · June 25, 2021, 6:28am

Is the following understanding of what you wrote correct? Bidirectional does not change the inputs of the underlying unidirectional layer. The underlying unidirectional layer in this case is LSTM and its input is X. Although we do supply a0 and c0 to LSTM, these are considered part of the parameters learned by LSTM, albeit the fact that these parameters are fixed and are not updated by backpropagation.

manifest · June 25, 2021, 7:28am

Hey @Meir,
just to simplify the explanation a bit :

Both hidden states of the LSTM are get initialized with zeros.
Bidirectional LSTM is just two LSTMs stacked together, one on top of another. Each LSTM has its own hidden states.
Once we initialized Bidirectional LSTM, we pass inputs and propagate through the network, so values of the hidden states get updated.

Topic		Replies	Views
Implementing LSTMs in Code Video (how do layers feed into each other) Natural Language Processing in TensorFlow	4	374	December 30, 2023
No initialization of a0 in input to Bi-LSTM, week 3, ex 1 Sequence Models	2	511	October 10, 2022
C5W3 modelf Sequence Models	7	354	November 3, 2023
W3A1 several questions Sequence Models	1	614	December 4, 2022
Bidirectional layer for time series forecasting Sequences, Time Series and Prediction week-3	3	490	July 21, 2023

The input of bi-directional layer

Related topics