Also note that this is an old thread and refers to the way this assignment used to work. The transpose and from_logits
parts are still valid, but they changed it a while ago so that we sum the loss values instead of computing the mean. Here’s a thread which explains why they made that change.
1 Like