Alcapa_model: why not sigmoid at prediction?

Assignment Transfer_learning_with_MobileNet_v1:

For binary classification, we use the sigmoid (or softmax for 3 or more outputs). Why do we use the linear activation in the prediction layer of the model ?


The sigmoid is there: it’s just hidden in the loss function call that they use there. Here’s another recent thread about the same question.

1 Like

Interesting. The search engine in Discourse seems to work pretty well. I was able to find that thread by searching for “alpaca”. Of course BinaryCrossentropy also works as more specific search term, that you have to already know the answer to the question in order for that to be useful :nerd_face:.


You are right. I should have used the Search function.

It’s fine either way, but it can save you time if you can just find a pre-existing answer. You never know how long it’s going to take someone to respond to your question. Of course these courses are still pretty new, so there’s no guarantee that any given question has been asked before. It just turns out that this one had been …

1 Like

For binary classification, we can use both softmax and sigmoid but recommended one is sigmoid.

There is a good explanation by Jeremy.
Just read the below notebook from section ‘Softmax’ onwards.

fastbook/05_pet_breeds.ipynb at master · fastai/fastbook (

1 Like