What does relu_backward and sigmoid backward do?

Osama_Saad_Farouk · December 7, 2022, 7:40pm

In week 4, assignment 1,

there are two functions calculated in the linear_activation_backward() function,

I don’t get what excatle do they do.

And what is the activation_cache that has been passed to them. and what does it store?

paulinpaloalto · December 7, 2022, 8:03pm

The activation cache is the value that was returned during forward propagation by the corresponding activation function. If you check that code, you’ll see that it contains just the Z value.

You can examine the source for relu_backward and sigmoid_backward by clicking “File → Open” and then opening the python file that has the utility functions. You can figure out the name of that file by examining the “import” cell, which is the first code cell in the notebook.

Note that the code for relu and sigmoid is also in that same python utility file.

Juan_Olano · December 7, 2022, 8:25pm

Hi @Osama_Saad_Farouk ,

I would like to add some additional notes to @paulinpaloalto ’ explanation, with the way I was able to understand this:

To understand the backward propagation, I think that it is important to first understand the forward propagation in detail.

Remember that in the Forward Prop, we go from the inputs, layer by layer, to the output. So we traverse all hidden layers, one by one.
As you know, each layer is made out of 1 or more units. While doing Forward Prop, we do two operations in each unit: a LINEAR operation (W * A + b) and a NON-LINEAR operation (in this case, ReLU or Sigmoid).

So keep this in mind: 2 operations per unit in each layer, from start to end.

And now we arrive to your question, which is related to the Back Prop:

While in the Forward Prop we were going from first to last layer, in the *BACK PROP we go in reverse from last layer to first, layer by layer, and inside each layer we go unit by unit (in reality units are processed in parallel, but just bear with me). In other words, we have to go exactly through the very same steps of Forward Prop, but in reverse.

Remember how we had 2 operations per unit in the Forward Prop, Linear and Non-Linear? Well, for the back prop, we have to do the exact same thing: effect 2 operations per unit, but in reverse order: First an operation related to the Non-Linear part, and then another operation of the Linear part.

And this is exactly the place where your question gets answered: You need to do a Gradient calculation for the NonLinear and then a calculation of Gradient for the Linear part.

In the exercise, they are providing the operations related to the “Non-Linear” part and you can explore the code exactly as @paulinpaloalto indicated above. These operations of the “Non-Linear” part are the relu_backward and sigmoid_backward. These operations will basically give you the “Gradient of the cost with respect to the activation”, be it the sigmoid activation or the ReLU activation.

Well, at least that’s how I explained this to myself - I hope this helps.

Juan

Osama_Saad_Farouk · December 7, 2022, 8:56pm

@paulinpaloalto @Juan_Olano,
thank you so much, it was a great explanation.

Topic		Replies	Views
Help on week 4 Q8 "linear_activation_backward" Neural Networks and Deep Learning	2	492	April 17, 2023
Course 1 week 4, assignment 1, exercise 8: linear activation backward Neural Networks and Deep Learning	4	648	February 11, 2022
W4_A1_Computing Activation functions in Linear Activation Backward Neural Networks and Deep Learning	7	491	August 14, 2023
W4_A1_cache's functionality in backpropagation Neural Networks and Deep Learning	1	511	December 4, 2022
Queries on backwards activation functions (C1W4) Neural Networks and Deep Learning	1	516	December 7, 2021

What does relu_backward and sigmoid backward do?

Related topics