Maybe see also this link tf.keras.losses.BinaryCrossentropy | TensorFlow Core v2.8.0
Which contains this key passage:
-
y_pred
(predicted value): This is the model’s prediction, i.e, a single floating-point value which either represents a logit, (i.e, value in [-inf, inf] whenfrom_logits=True
) or a probability (i.e, value in [0., 1.] whenfrom_logits=False
).
The network can output either a value in range [-inf, inf] or [0., 1.] (depending on the activation used as @balaji.ambresh correctly shows above.) The loss function needs to know which it is in order to properly interpret the forward prop outputs. from_logits
is used to keep the network and the loss function in synch.