How does training for a Serial model work?

Yuncheng_Liao · May 28, 2023, 5:30am

Let’s say I have a Serial model like this:

Serial[
  Embedding_35181_50
  LSTM_50
  Dense_17
  LogSoftmax
]

We have weight matrices in layers such as Embeddings, LSTM, and Dense. How does training and evaluation work in this case? We only have one loss function that compares the prediction and target. Do we use it to update weights in all layers? What about the weights in the Embedding, does it use the same loss matrices with LSTM?

arvyzukai · May 28, 2023, 6:29am

Hi @Yuncheng_Liao

That is correct.

It’s not only that - when we calculate the loss we track how much each weight (in all layers) contributed to it (which weights contributed to the loss positively or negatively and by how much).

Yes.

Yes, the loss is “tracked” all the way up to the first parameters.

Cheers

Yuncheng_Liao · May 28, 2023, 10:22am

Thanks for your answer. Could you provide more information on how the training works for the Serial layers? I’d like to see a documentation explaining how it works because there is no such materials in the this week content. I hope this will also help other students to better learn the course.

arvyzukai · May 28, 2023, 11:41am

No problem

Serial is just a “combinator” - in other words, it combines all the layers into “one” (all the layers “in it” are run in the order you specify).

I believe there is a link somewhere in the course, but in any case here it is again - training. Training code in Deep ML libraries is pretty complicated (trax code is actually one of the most readable) because of the aspects needed to be addressed (to be used with different models, different hardware, different anything…) so it’s not very straight forward to understand (unless you are into software development).

This course mainly focuses on NLP aspects and does not get very much into software/hardware details.

Cheers

Topic		Replies	Views
Problem with understanding tl.Serial NLP with Sequence Models week-module-3	3	602	July 1, 2022
Serial model forward prop not performing operations step by step? NLP with Sequence Models week-module-1	2	499	October 17, 2022
C3_W1_Assignment - Exercise 6 - train_model NLP with Sequence Models week-module-1	2	521	April 24, 2023
Why is it important to use the same weights to predict all the words in the sentence? Sequence Models coursera-platform	1	528	June 23, 2021
Question about Optional Lab1 : Neurons and Layers Advanced Learning Algorithms week-module-1	1	499	April 21, 2023

How does training for a Serial model work?

Related topics