RNN for speech recognition

balaji.ambresh · March 15, 2023, 9:22am

There are 2 ways of generating sequences. Both of them involve predicting \hat{y}^{<t>} based on the input at the current time step and the activation from the previous time step. As far as the 1st token is concerned, you can make both x^{<t>} and a^{<t-1>} as zeros when training the model. This is like passing a dummy START_TOKEN to to generate output for the 1st time step.

Here’s how both types of sequence generators differ:

If the output at the current time step is directly used in predicting the output, then, output is going to correspond to the most frequently occurring token at start of input.
If a random token is sampled based on the output of the current time step, i.e. \hat{y}^{<t>} is used in sampling 1 token from all tokens, then, we can generate novel sequences.

As far as the dimensions go, \hat{y}^{<t>} has vocabulary as its dimension. Please read this topic on the dimension of hidden state.

Topic		Replies	Views
Week 1 Sampling Novel Sequences Sequence Models	4	593	June 20, 2024
W1 A1 dimensions of n_a and n_y? Sequence Models week-1	2	6	December 13, 2024
RNN input doubt Sequence Models	8	443	June 2, 2023
Sampling Novel Sequences Sequence Models	6	528	January 13, 2023
DLS course 5 week 1 Sequence Models	2	488	April 30, 2023

RNN for speech recognition

Related topics