Question about the Lecture "language model and sequence generation"

Mohammad_Sakka · August 3, 2022, 2:24pm

in the Lecture we saw the model with multi units and the input for the first unit for example is X<0> wich is zeros vector, so How this unit can expect that the output is “cats” word without any meaningful input??!, and for the others units, each unit has the input wich is the true output from the previous unit, how the second unit will expect that the second word is “averages” word and the only two inputs of it are the activation value of the previous unit and the true output of the previous unit?! without any related input for “averages” word?!
can you please help me understanding?

paulinpaloalto · August 3, 2022, 6:32pm

Getting meaningful and useful output only happens after you have trained the model on a large corpus of input data, right? So it has learned patterns of what sequences are more likely than others. That learning is expressed in the various weights that are used in the model.

Also note that the “hidden state” gets updated and is not deterministic, but based on what it has seen recently.

Mohammad_Sakka · August 3, 2022, 6:50pm

Thanks you for replying!
but my question is about this slide, how the first unit predict that y_hat<1> is “cats” given zeros vector X<1> ?

paulinpaloalto · August 3, 2022, 6:56pm

It’s what I said above: the model is trained, so it has weights associated with both the transformation of x^{<0>} and a^{<0>} into a^{<1>} and also learned weights associated with transforming a^{<1>} into \hat{y}^{<1>}, which is a softmax output that maps to one word in the dictionary.

Note that since x^{<0>} and a^{<0>} are both all zeros in this example, it will only be the learned bias values for the hidden state computation that affect the a^{<1>} value, but it will still be non-zero and then that will be input into the computation of \hat{y}^{<1>}.

Mohammad_Sakka · August 3, 2022, 7:08pm

thanks you again Mr. Paul, and sorry for annoying, but I still confused, why y^<1> should be “cats”, why shouldn’t be “Dogs” for example?!, is the model trained with an example that has output = “cats” for an input of zeros?

paulinpaloalto · August 3, 2022, 7:14pm

Prof Ng is just giving this as an example. Sure, it could be “Dogs” or “Kangaroos” for all we know. It depends on the training data. Also note that there are techniques for adding some randomness to the outputs. You’ll see an example of how to build that in when you get to the Dinosaur Name assignment, which is Assignment 2 in Week 1 here.

Maybe it would be good to just “hold that thought” and continue on through to the first two assignments and see examples of how this all works.

Mohammad_Sakka · August 3, 2022, 7:19pm

Thanks you Mr. Paul, I think that I got it.

Topic		Replies	Views
Language modelling with an RNN Sequence Models coursera-platform	1	490	February 18, 2023
C5_W1 Language Model and sequence generation Sequence Models week-module-1 , coursera-platform	2	14	December 30, 2024
Understanding the Mechanisms of Sequence Prediction Sequence Models coursera-platform	1	510	June 17, 2023
Language Model and Sequence Generator - Using y as input instead of y_hat Sequence Models coursera-platform	12	528	February 28, 2023
Use of a[len(X-1)] Sequence Models coursera-platform	7	541	March 18, 2023

Question about the Lecture "language model and sequence generation"

Related topics