In the compute_total_loss function I don’t really understand why we had to transpose the labels and logits.
Maybe reading this would help.
PS: Please move your thread to the relevant course category as described here.
This question comes up pretty frequently and here’s another previous thread that discusses that point in a bit more detail, including the rationale for why it turns out that way.