I don’t understand why the model is called as a tuple? From previous assignments this is usually done when tl.Parallel is used. But in the model architecture I do not find the necessity to provide a tuple. Maybe I am missing something basic. Can someone please help me understand this?
I believe in this one (although I could not trace exaclty the starting point) because there are many functions linked to the final model, the tokenized input is served as padded tensors going from the beginning of the lab till the creation of the TransformerLM.
Sorry, did not understand. What is the necessity of providing the same input twice to the model as a tuple ?
It should be because of QKV multiplication, one is masked the other is not, 2 copies of the input are passed to the model.
I don’t think you are correct on this one @gent.spah - the elements in the tuple are identical and have nothing to do with QKV (unless I don’t understand something?).
And in general, it is a good question @Ritu_Pande why the instructions ask to input the tuple, because the model would work just fine with:
output = model(padded_with_batch)
Maybe it is a remnant of some code that used the second output as a target… I don’t know …
In any case, during inference (the
next_symbol function) the model never “touches” the second input (the second input could be changed to
np.zeros_like(padded_with_batch) and it would not change the output of the model - the
_ variable would be zeroes).
I will report it for further investigation.
Sometimes I am also wrong @arvyzukai and that might be the case, was not fully convinced thats why I said I believe. If anybody finds the right answer let us know!
Of course, @gent.spah, respect for admitting of being wrong and everyone is wrong at some point I might be wrong here too so that is why I asked for further comment.
No worries @arvyzukai its good to be wrong too
Yap @gent.spah and the best outcome of this thread would be if we’re both wrong. That would mean that we can learn something and correct our wrong model (no pun intended) of the world