Question about is_logit

Hello Eakanath @eix_rap,

Here are the steps.

I want you to look at how we have avoided computing e^{(-z)} where z is too negatively large to overflow the computed result. We have e^{(-z)} because we use sigmoid for binary classification, or we have e^{(z)} because we use softmax for multi-class classification. Note the importance of the choice of the activation, I was therefore feeling unsafe when you said “any activation”.

Ofcourse, I might be over-reacting because you might always have been thinking only about sigmoid and softmax, but please don’t mind and let me make it a bit clearer :wink:

Cheers,
Raymond