RNN for speech recognition

Vahdet_Vural · March 15, 2023, 8:40am

Hello,
In the video of 1st Week of C5, Language Model and Sequence Generation, as shown in the screen shot below, we input all zeros vectors, namely x<1> =0 and a<0>=0, but we could still get some probabilities for the words in vocabulary. So the question is how is that possible to get some result although the inputs are trivial or just zeros.?
My second question is: what are the dimensions of y s and a s? So Andrew means we implement softmax for the probabilities . This would mean y s and a s have the dimension of the vocabulary, which is 10k. Is this then correct?

balaji.ambresh · March 15, 2023, 9:22am

There are 2 ways of generating sequences. Both of them involve predicting \hat{y}^{<t>} based on the input at the current time step and the activation from the previous time step. As far as the 1st token is concerned, you can make both x^{<t>} and a^{<t-1>} as zeros when training the model. This is like passing a dummy START_TOKEN to to generate output for the 1st time step.

Here’s how both types of sequence generators differ:

If the output at the current time step is directly used in predicting the output, then, output is going to correspond to the most frequently occurring token at start of input.
If a random token is sampled based on the output of the current time step, i.e. \hat{y}^{<t>} is used in sampling 1 token from all tokens, then, we can generate novel sequences.

As far as the dimensions go, \hat{y}^{<t>} has vocabulary as its dimension. Please read this topic on the dimension of hidden state.

Vahdet_Vural · March 15, 2023, 10:23am

Well I am kind of more confused now. So in the video the training is done, using your 1st type of generating sequences, which takes the correct words or tokens in a sentence as input and as output we get a probability vector, y hat .
But why do we need sampling process at all? Why does np.random.choice do? Is teh sampling process a continuation of training the model?
I mean let´s say we get the output y hat at tme step t. This output already gives us the probabilities of all tokens in the vocabulary and therefore we can obtain the most common or probable token from that ouput. In this case why do we need sampling?
Sorry for possibly bothering you with my questions but I just want to get it right.
Thanks for your patience!

balaji.ambresh · March 15, 2023, 11:10am

No worries.

As far as training is concerened, the 1st method is used.
The 2nd method is used only after training. Please see this lecture

Vahdet_Vural · March 15, 2023, 11:40am

Yes I did watch that video but still there is no explanation of why we need sampling. What does sampling do at all?
In the sampling process we are getting only the most comon words in the literature or training text, which has nothing to do with our input speech sentence that is supposed to be recognized by the machine.
Ok apprently I need to watch some other explainer videos.
Thanks for answers.

Topic		Replies	Views
Week 1 Sampling Novel Sequences Sequence Models	4	593	June 20, 2024
W1 A1 dimensions of n_a and n_y? Sequence Models week-1	2	6	December 13, 2024
RNN input doubt Sequence Models	8	443	June 2, 2023
Sampling Novel Sequences Sequence Models	6	528	January 13, 2023
DLS course 5 week 1 Sequence Models	2	488	April 30, 2023

RNN for speech recognition

Related topics