Hi guys,

For HW of “Transfer_learning_with_MobileNet_v1, Exercise 2 - alpaca_model”, the last layer is

```
# use a prediction layer with one neuron (as a binary classifier only needs one)
outputs = tf.keras.layers.Dense(1)(x)
```

I am not very following on this. So, this time we don’t need sigmoid? And why just one neuron? I checked tensorflow.org, it says

Apply a `tf.keras.layers.Dense`

layer to convert these features into a single prediction per image. You don’t need an activation function here because this prediction will be treated as a `logit`

, or a raw prediction value. Positive numbers predict class 1, negative numbers predict class 0.

but I don’t understand it. Maybe I forgot something. If there is a link about this, please share it, I will read as well. Thank you!

Check the documentation for the loss functions and look at what the `from_logits`

argument does. It turns out that it is both more efficient and more “numerically stable” to implement the sigmoid or softmax activation as an internal part of the cost computation, instead of doing it as a separate step. That is why Prof Ng always does it this way as soon as we switched to TF in C2 W3. Remember that’s how it worked in the `compute_cost`

function there.

The term “numerically stable” is not just some handwaving. That is actually a well defined “term of art” in the field of Numerical Analysis. It turns out that when you are working with finite representations like floating point, instead of the abstract beauty of \mathbb{R}, there can be different ways to express the same mathematical formula which have better or worse properties in terms of rounding errors. “Numerical stability” is a precise way to talk about that phenomenon.

The reason you only need a single output is precisely that we’re back to doing a binary classification, instead of a multi-class one. The answer is “yes/no”, so you only need one bit to express that, although we end up with 64 bits (but a single value).

1 Like