Only after the entire sequence is processed from both directions is the output from BRNN made avaiable. As far as invoking the child RNNs are concerned, keras implementation does it sequentially. In theory, this can be done in parallel.
BRNN uses 1 RNN for processing the input from each end (totals to 2). When making predictions, outputs of both these child RNNs are combined to generate output at every timestep. In tensorflow, if you provide just 1 layer as input (i.e. no backward_layer
) to a BRNN, a clone of the forward direction layer will be used for learning from the backward direction. Do look at the merge_mode flag to understand how outputs are returned.
As far as backpropagation is concerned, in theory, the child RNNs can update in parallel.