model expects a tuple containing two padded tensors (with batch)
I don’t understand: what parameters or properties of the model tell us that the model expects a tuple containing two padded tensors with batch? Can someone please explain? In general, beyond the model structure, is there a way to investigate and debug the inner workings of the model? I wish the course spent some time on that.
Also, is my assumption correct that the symbols corresponding to the output sequence preceding the next symbol should mirror the input sequence?
That is a good question Generally, trax training expects a tuple of (input, output) or (input, output, weights). The details are somewhat a lot to explain in single post. Unfortunately the trax TrainTask documentation is not that good on details, the best way is to look inside the code and that could be liberating or challenging depending on one’s programming knowledge.
You can debug notebooks by adding %%debug at the top of the cell, but here you also need a bit of knowledge. Mostly you use n for next line, s when you want to get into the function (for example, trax uses .forward() and pure_fn() methods (it’s where you would want to “step in”), and exit when you want to exit the debugger.
The model outputs probabilities for the whole sequence (including symbols up to the “next_symbol”, the “next_symbol” and symbols after it) and it should assign highest or at least very high probability for the “up to the next_symbol” symbols. In other words, the loss is calculated for the whole sequence and it better be that the probabilities are highest (or very high) where the targets are.
Thanks for sharing the tips about using %%debug. I strongly recommend adding a short video demo on its use at the beginning of the course.
Secondly, given this:
… why does the hint in UNQ_C9 suggest that to obtain the log probability for the next symbol, we should look for it at the token_length position of the log_probs array? Shouldn’t it be at least at token_length+1 (but probably further away given the EOS and the SEP characters)?
I’m inclined to think that the course is Natural Language Specialization and it should not cover a lot of topics, including this. But I would agree with you that most of the learners using notebooks should be familiar with magic commands and also have some basic debugging knowledge (especially what to do after receiving an error).
Because the way indexing works - the token_length value is the index we want. For example, if we have generated “I love learning” and in this case the token_length would be of value 3 (3 words), then the index for the next_token is also 3. In other words, value at index 3 of [7, 8, 9, 10] is value 10.
Thanks for discussing this problem! I recently also has this problem with determining the parameters for the input. In this case , I believe in training process, the model actually takes three parameters: input, output and mask, as previously defined in the stream generator. Why the mask is neglected at the later stage?
It’s been a while since I looked at the code deeper but I don’t thing the training.Loop neglects the training_mask during training in this case.
If I remember correctly, the training model in this assignment takes two identical input streams of data (both with (input, output and mask)).
Digging deep into training.Loop code is a bit of a challenge in trax and you helped me remember that someone asked the similar question and had no time to look into it thoroughly. So if you’re doing a deep dive into this, please share your conclusions