The size of the entire sequence of hidden states is the arguments of units in the LSTM. The hidden states a from the initial a1 to the final aTx are predictions of the LSTM along with the output y. This fact is independent of wether the LSTM is uni-directional or bi-directional.

In short for the estimation of an LSTM (uni or bi-directional) you only need the inputs. The parameters are estimated and the states and outputs are predicted through the optimization algorithm (usually an improved version of gradient descent).

I hope this explanation helps if there are still any doubts don’t hesitate to question.

Is the following understanding of what you wrote correct? Bidirectional does not change the inputs of the underlying unidirectional layer. The underlying unidirectional layer in this case is LSTM and its input is X. Although we do supply a0 and c0 to LSTM, these are considered part of the parameters learned by LSTM, albeit the fact that these parameters are fixed and are not updated by backpropagation.