GRUs are to keep track of for e.g. differentiate between singular vs plural, for e.g. “The cat,…, was …” vs “The cats,…, were …”.
while BRNN are explained to keep track of how each words is associated to the next word.
I am little confused and would like to know aren’t GRUs also keeping track of how each word is associated to the next word , as GRUs can differentiate between singular v plural so it is keeping track of words ?
When one says a GRU layer, they’re referring to a layer consisting of a single GRU cell. This layer consists of parameters for traversing the input from start to end i.e. in 1 direction. The cell state is determined by words seen from start of time till the current timestep. This layer has no idea about future context.
A Bidirectional layer consists of 2 RNN layers. The one we specify during construction is for the forward direction. If you don’t specify one for the backward direction, a clone of the forward layer is used for the backward direction.
This helps the bidirectional layer to make use of both forward and backward layers to capture context from either directions of in the input sentence when making predictions (as explained in the lecture).
Both unidirectional and bidirectional layers can be used to grasp the context across the input.