so for the coding exercise where we have to compute total_loss with reduce_sum and categorical_crossentropy, why is that we have to transpose the labels and logits? I just wanted some insight.
It’s because of how that function works. No other reason, it’s an implementation detail.
1 Like
Here’s a thread which discusses that.