Let’s say I have a Serial model like this:
Serial[
Embedding_35181_50
LSTM_50
Dense_17
LogSoftmax
]
We have weight matrices in layers such as Embeddings, LSTM, and Dense. How does training and evaluation work in this case? We only have one loss function that compares the prediction and target. Do we use it to update weights in all layers? What about the weights in the Embedding, does it use the same loss matrices with LSTM?
Hi @Yuncheng_Liao
That is correct.
It’s not only that - when we calculate the loss we track how much each weight (in all layers) contributed to it (which weights contributed to the loss positively or negatively and by how much).
Yes.
Yes, the loss is “tracked” all the way up to the first parameters.
Cheers
Thanks for your answer. Could you provide more information on how the training works for the Serial layers? I’d like to see a documentation explaining how it works because there is no such materials in the this week content. I hope this will also help other students to better learn the course.
No problem 
Serial is just a “combinator” - in other words, it combines all the layers into “one” (all the layers “in it” are run in the order you specify).
I believe there is a link somewhere in the course, but in any case here it is again - training. Training code in Deep ML libraries is pretty complicated (trax code is actually one of the most readable) because of the aspects needed to be addressed (to be used with different models, different hardware, different anything…) so it’s not very straight forward to understand (unless you are into software development).
This course mainly focuses on NLP aspects and does not get very much into software/hardware details.
Cheers
1 Like