in the Lecture we saw the model with multi units and the input for the first unit for example is X<0> wich is zeros vector, so How this unit can expect that the output is “cats” word without any meaningful input??!, and for the others units, each unit has the input wich is the true output from the previous unit, how the second unit will expect that the second word is “averages” word and the only two inputs of it are the activation value of the previous unit and the true output of the previous unit?! without any related input for “averages” word?!

can you please help me understanding?

Getting meaningful and useful output only happens *after* you have trained the model on a large corpus of input data, right? So it has learned patterns of what sequences are more likely than others. That learning is expressed in the various weights that are used in the model.

Also note that the “hidden state” gets updated and is not deterministic, but based on what it has seen recently.

Thanks you for replying!

but my question is about this slide, how the first unit predict that y_hat<1> is “cats” given zeros vector X<1> ?

It’s what I said above: the model is trained, so it has weights associated with both the transformation of x^{<0>} and a^{<0>} into a^{<1>} and also learned weights associated with transforming a^{<1>} into \hat{y}^{<1>}, which is a softmax output that maps to one word in the dictionary.

Note that since x^{<0>} and a^{<0>} are both all zeros in this example, it will only be the learned bias values for the hidden state computation that affect the a^{<1>} value, but it will still be non-zero and then that will be input into the computation of \hat{y}^{<1>}.

thanks you again Mr. Paul, and sorry for annoying, but I still confused, why y^<1> should be “cats”, why shouldn’t be “Dogs” for example?!, is the model trained with an example that has output = “cats” for an input of zeros?

Prof Ng is just giving this as an example. Sure, it could be “Dogs” or “Kangaroos” for all we know. It depends on the training data. Also note that there are techniques for adding some randomness to the outputs. You’ll see an example of how to build that in when you get to the Dinosaur Name assignment, which is Assignment 2 in Week 1 here.

Maybe it would be good to just “hold that thought” and continue on through to the first two assignments and see examples of how this all works.

Thanks you Mr. Paul, I think that I got it.