Question about c1w3 quiz

heeseong_kim · October 29, 2021, 10:40am

The question is “You are building a binary classifier for recognizing cucumbers (y=1) vs. watermelons (y=0). Which one of these activation functions would you recommend using for the output layer?”
The answer is “sigmoid”

‘tanh’ was also among the choices.
I am wondering why it is sigmoid, not tanh. I heard tanh is better than sigmoid in a lot of cases. And I thought I can use the output value of tanh( -1~ 0) as the classified label 0 and the output value of tanh( 0~ 1) as the classified label 1.

paulinpaloalto · October 29, 2021, 11:44pm

But the point is that the loss function (log loss) is tied to the sigmoid activation function in that it requires output values between 0 and 1. In other words, you can’t just arbitrarily change the output activation by itself: you need to adjust the loss function as well. So what loss function would you use if tanh is your output activation with a range of (-1,1)?

BTW it turns out you could scale and shift tanh so that it is the same as sigmoid. They are very closely related mathematically.

heeseong_kim · October 30, 2021, 4:06am

Thank you for your answer.

Topic		Replies	Views
Is Tanh better than sigmoid? Neural Networks and Deep Learning	5	672	May 11, 2023
Why is sigmoid activation function better for binary classification than the tanh activation function Improving Deep Neural Networks: Hyperparameter tun	2	683	September 21, 2021
Using tanh vs. sigmoid for output layer Neural Networks and Deep Learning	6	776	October 20, 2022
Better Activation functions: (tanh > sigmoid) MLS Resources	18	1050	November 10, 2022
Why not use tanh-func for output a^L? Neural Networks and Deep Learning	1	512	August 5, 2021

Question about c1w3 quiz

Related topics