Why does the activation function's slope matters instead of its log? [Week 3, Activation Function's Video at 4:20]

tales.tsp · August 15, 2021, 3:22am

Hi everybody.

In the Activation Function’s Video at 4:19 time Andew talks about how the slope of the Sigmoid Function can slow down the Gradient Descent stage.

However, the derivative that we are looking for, the dJ/dz, takes into account the Log of the Activation Funcion, instead of the A.F. itself. Please recap the Logistic Regression Cost Function J.

As we can see in the following plots, both functions (A.F. and its log) have different shapes, so different slopes.

Sigmoid and its log

Tanh and its log

Did I misunderstood something? I am not sure I learned it right.

Thank you guys, I would appreciate any clarifications

I also would like to thank Desmos Calculator for always helping me with plotting

thearkamitra · August 15, 2021, 5:44pm

Hi @tales.tsp,

For Linear regression, the derivative at the end node does consider the log of the activation function. But consider any activation function in any of the layers in between. I am sure you would agree it looks at the activation function. Actually in the linear regression also, it pays importance to log of the activation.

J = log(f(x)).
Then dJ/dx = 1/f(x)*d(f(x))/d(x) = 1/f(x) * derivative of the activation function.
Hope this clarifies!

jm1e16 · September 1, 2021, 8:56am

Hi @tales.tsp ,

I believe the reason for preferring the Tanh was just scaling of the inputs A to the next layers. Using Tanh they will be centered around zero which is supposedly an advantage.

The argument about the slope was for using the ReLU function in favor of the Sigmoid or Tanh.

Topic		Replies	Views
Week 2 Derivatives: Logistic Regression as a neural network Neural Networks and Deep Learning coursera-platform	2	604	September 29, 2021
Clarification of the Derivative of the Log Loss Function Neural Networks and Deep Learning coursera-platform	2	1013	April 17, 2022
Tanh and sigmoid are closely related Neural Networks and Deep Learning coursera-platform	3	909	March 3, 2022
Better Activation functions: (tanh > sigmoid) MLS Resources	18	1239	November 10, 2022
Activation function Lecture Video 4:13 Neural Networks and Deep Learning coursera-platform	1	543	June 17, 2021

Why does the activation function's slope matters instead of its log? [Week 3, Activation Function's Video at 4:20]

Related topics