Since we are adding a binary classification layer, why is the test case checking for a linear dense layer as the output? Shouldn’t it be a sigmoid dense layer?

There are two ways to get a classifier output.

- Use sigmoid activation in the output layer
- Use a linear activation in the output layer, and tell the prediction function to convert the linear output to logits.

Both methods give the same results, but the 2nd method has some mathematical advantages under TensorFlow.

Yes, once we switched to using TF, we use method 2) that Tom describes: we always define the output layer without the activation function and then take advantage of the *from_logits = True* mode of the loss functions to integrate the output activation and loss calculations together. It’s been that way since we saw TF for the very first time in Course 2 Week 3.

Thanks Tom and Paul for the answers!