Using tanh vs. sigmoid for output layer

paulinpaloalto · October 19, 2022, 10:25pm

I don’t know if there are any “simple words” that will suffice here, but “cross entropy loss” (also sometimes called “log loss”) is a function that is derived from the concept of “estimating maximum likelihood” in statistics. This has been around at least since the days of Leonhard Euler, so it’s not something new created just for machine learning that just popped into somebody’s mind. Prof Ng explains it in the Week 2 lectures and here’s a thread from Mentor Raymond that gives a really nice explanation. Sorry, but as warned above, neither of those probably qualifies as “simple words”.

Here’s another thread that discusses this and shows some graphs.

Topic		Replies	Views
Is Tanh better than sigmoid? Neural Networks and Deep Learning coursera-platform	5	678	May 11, 2023
Why not use tanh-func for output a^L? Neural Networks and Deep Learning coursera-platform	1	513	August 5, 2021
Why is sigmoid activation function better for binary classification than the tanh activation function Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	685	September 21, 2021
Question about c1w3 quiz Neural Networks and Deep Learning coursera-platform	2	702	October 30, 2021
Tanh and sigmoid are closely related Neural Networks and Deep Learning coursera-platform	3	878	March 3, 2022

Using tanh vs. sigmoid for output layer

Related topics