W4 - Shouldn't the activation A be also cached?

paulinpaloalto · November 6, 2024, 7:45pm

The point of the cache values is that it needs to cover the general case. The general formula involves g'(Z), right? You just happen to get lucky in the sigmoid case that g'(Z) can be computed more cheaply directly from A and not by using Z. The problem is that is not true for all activation functions. Although it also happens to be true for tanh.

Maybe that is because it turns out that tanh and sigmoid are very closely related mathematically.

One way to get both a general solution and avoid recomputing in some cases would be to include both Z and A in the cache. That would cost a little more memory, but save compute in some cases. If we were considering implementing this on our own, we could add that feature.

Topic		Replies	Views
Course 1 week 4, assignment 1, exercise 8: linear activation backward Neural Networks and Deep Learning coursera-platform	4	648	February 11, 2022
Are we caching Z for backprop only for RELU? Neural Networks and Deep Learning coursera-platform	5	597	September 6, 2022
Sigmoid Function in Layer L Neural Networks and Deep Learning coursera-platform	8	721	January 30, 2023
W4_A1_Computing Activation functions in Linear Activation Backward Neural Networks and Deep Learning coursera-platform	7	492	August 14, 2023
Help on week 4 Q8 "linear_activation_backward" Neural Networks and Deep Learning coursera-platform	2	492	April 17, 2023

W4 - Shouldn't the activation A be also cached?

Related topics