Hello @Nhat_Minh, welcome to our community!
The motivation behind the improvement is, tensorflow works more accurately with the 3rd equation on the left then with the 2nd equation on the left. We can see that in the 3rd equation, the term a
never shows up, which means that by adopting the 3rd equation, tensorflow does not need to calculate a
. This is good because calculating a
out can generate numerical inaccuracy which is not favourable.
To make tensorflow works without calculating a
out, we need to change the activation in the output layer from sigmoid
to linear
, because having sigmoid
there is the reason for tensorflow to compute a
out. Using linear
actually means that we do not need any activation function. Therefore, changing from sigmoid
to linear
means that we are changing from passing a = g(z)
into the loss to passing z
into the loss.
Moreover, we need to let Tensorflow knows we are passing z
into the loss instead of a
, because tensorflow cannot detect the change itself. And we notify tensorflow by adding from_logits=True
there. Also, logit is the name for z
.
Now, with these two code changes, we enjoy a more accurate process of model training.
Cheers,
Raymond