Why Activation function in last layer - linear - C4W2

For Binary classification problem it is logical (and allso from previous courses) to use sigmoid activation function in the output layer. But in programm assesmemt Transfer Learning with MobileNet use linear activation func. I tried different activation func - sigmoid, relu. Yes - Only with linear function nnet was learning. Could sombody explain why - what is intuition ?

They are using sigmoid, but they choose to use the from_logits = True mode of the binary cross entropy loss function to compute sigmoid and the loss together. It is both more efficient and more numerically stable to do it that way. Please read the documentation of TF binary cross entropy loss for more information.

If you look back over past assignments since TF was first introduced in Course 2 Week 3, this is always the way Prof Ng does it.