BRNN Inputs

Hello Everyone, Merry Christmas!

I have following questions regarding RNNs:

  1. I understand that in case of Unidirectional RNNs for training speech recognition model in hidden state as x we give y as an input, which is the correct word from the previous hidden state. But I think Andrew in the lecture hasn’t mentioned how it’s done in case of BRNNs? Do we take into account correct words from both, previous and following hidden states as input x?

  2. Is beam search applicable to Speech Recognition too?

  3. In Machine translation, particularly in encoder part, is the same logic - giving the previous state correct word as an input to the current state applied?

  1. See this link to notice that you’ll need the speaker to complete the sentence before using a BRNN to transcribe the voice input.
  2. You can use beam search for this as well. Do look at other sources to see if there’s a metric for voice to text task.
  3. Assuming we’re talking about machine translation using RNNs, provide previous hidden state and current word as input to predict the next word. Previous hidden state is all 0s for a<0>. This is the general mode of operation of an RNN. The encoder learns the entire representation of the input sentence before emitting the translated sentence.