Hello,
so I was wondering whether this version of “many to many” introduced in the course can also serve as bidirectional RNN
because from this schematic representation, we should have Tx+Ty+1 activations (considering only 1 layer), which is similar to what we have in bidirectional RNNs when Tx=Ty.
And also even the first y_hat will have full information on all the x sequences.
But maybe some drawbacks such as vanishing gradients?