The input of bi-directional layer

The example code given in Assignment 1 of Week 3 is:
sequence_of_hidden_states = Bidirectional(LSTM(units=..., return_sequences=...))(the_input_X)

It seems to suggest that only X is given as input to Bidirectional. Why shouldn’t we pass to it the initial hidden and cell states as well?

The code for that is further down in the function.

You probably mean that we pass those to the model, right? But why not to Bidirectional as well?

Frankly, I do not know. TF is large and mysterious, and the syntax isn’t always clear.

Could you please send this questions to other mentors, so we can hopefully get clarity on this issue?

Done. Hopefully someone will stop by.

The size of the entire sequence of hidden states is the arguments of units in the LSTM. The hidden states a from the initial a1 to the final aTx are predictions of the LSTM along with the output y. This fact is independent of wether the LSTM is uni-directional or bi-directional.

In short for the estimation of an LSTM (uni or bi-directional) you only need the inputs. The parameters are estimated and the states and outputs are predicted through the optimization algorithm (usually an improved version of gradient descent).

I hope this explanation helps if there are still any doubts don’t hesitate to question.

1 Like

Is the following understanding of what you wrote correct? Bidirectional does not change the inputs of the underlying unidirectional layer. The underlying unidirectional layer in this case is LSTM and its input is X. Although we do supply a0 and c0 to LSTM, these are considered part of the parameters learned by LSTM, albeit the fact that these parameters are fixed and are not updated by backpropagation.

Hey @Meir,
just to simplify the explanation a bit :slight_smile::

  • Both hidden states of the LSTM are get initialized with zeros.
  • Bidirectional LSTM is just two LSTMs stacked together, one on top of another. Each LSTM has its own hidden states.
  • Once we initialized Bidirectional LSTM, we pass inputs and propagate through the network, so values of the hidden states get updated.
1 Like