One other point to make here: just to be accurate, the network you have implemented is not Logistic Regression. It is a Fully Connected network with 3 layers which does binary classification. Logistic Regression is essentially a trivial Neural Network with only the “output” layer and does binary classification.
I would also state the case differently: you’re using “sigmoid” at the output layer either way. It’s just a question of whether you explicitly include the “sigmoid” activation or whether you let it be handled internally within the cross entropy loss function (the from_logits = True
mode). Of course if you don’t explicitly add the “sigmoid” in the output layer, then you also have to add it explicitly in your “predict” logic.
So maybe you could argue that the one case in which explicitly adding “sigmoid” in the output layer is better is that it makes your predict logic simpler, if that goal is more important to you than the improved numerical accuracy gained by the other method. You could also try it both ways in a given case to see if the predictions are actually affected by any accuracy differences. It’s possible that in any given case it ends up not mattering that much to the results of training.
Here’s a thread which discusses why the from_logits = True
method is preferred. Here’s a thread from Raymond that goes into some depth in showing why the latter method is more accurate.