Transfer Learning Assignment - Binary Classification Question

In Exercise 2 of the Transfer Learning Problem, we are asked to work on a function called “alpaca_model”. The last layer within that model is a Dense layer for Binary Classification. However instead of using a “sigmoid” activation function, why are we using a “Linear” activation function?

That is the way Prof Ng has been doing things since we first got introduced to TF in DLS C2 W3. When the loss function is one of the forms of cross entropy loss (either binary or higher dimensional), then it is more efficient to use from_logits = True mode and let the loss function include the calculation of the output layer activation function (either sigmoid or softmax). Here’s a thread which discusses why that is done.

Note that means you have to manually include the sigmoid when you do “predict” mode with the trained model.

Thanks for the explanation! Makes sense now :slight_smile: