Hello @Khalid_A.W,
We don’t just prefer to use linear instead of sigmoid for activation. We prefer to use linear for activation AND setting from_logit
to True in the loss function that’s passed into the model training. If you following these steps, you will see that sigmoid is never out of the game.
Again, sigmoid is there if we set from_logit
to True. It is NOT in the output layer because we have linear
for the layer’s activation, however, it IS in the loss function if we set from_logit
to True.
Do the experiment yourself
Cheers,
Raymond