Just one further clarification here: the issue of from_logits
is independent of whether we are doing binary or multiclass classification. In either type of classification, it makes more sense to use from_logits = True
, which just means that whatever the activation function is (sigmoid or softmax) happens as a unified part of the cross entropy loss function calculations. Here’s a thread which explains why this mode is more advantageous. The “tl;dr” is that from_logits = True
gives you answers that are closer to the real mathematically correct answers we would get if we could do the calculations using the real numbers \mathbb{R}, instead of just approximating things in floating point.
2 Likes