Right: the point is that the sigmoid
(or softmax
) is included, but it’s being handled internally by the loss function instead of explicitly in your code. Here’s a thread which explains why they do it that way.
1 Like