In week 4, assignment 1,
there are two functions calculated in the linear_activation_backward() function,
I don’t get what excatle do they do.
And what is the activation_cache that has been passed to them. and what does it store?
In week 4, assignment 1,
there are two functions calculated in the linear_activation_backward() function,
I don’t get what excatle do they do.
And what is the activation_cache that has been passed to them. and what does it store?
The activation cache is the value that was returned during forward propagation by the corresponding activation function. If you check that code, you’ll see that it contains just the Z value.
You can examine the source for relu_backward
and sigmoid_backward
by clicking “File → Open” and then opening the python file that has the utility functions. You can figure out the name of that file by examining the “import” cell, which is the first code cell in the notebook.
Note that the code for relu
and sigmoid
is also in that same python utility file.
Hi @Osama_Saad_Farouk ,
I would like to add some additional notes to @paulinpaloalto ’ explanation, with the way I was able to understand this:
To understand the backward propagation, I think that it is important to first understand the forward propagation in detail.
Remember that in the Forward Prop, we go from the inputs, layer by layer, to the output. So we traverse all hidden layers, one by one.
As you know, each layer is made out of 1 or more units. While doing Forward Prop, we do two operations in each unit: a LINEAR operation (W * A + b) and a NON-LINEAR operation (in this case, ReLU or Sigmoid).
So keep this in mind: 2 operations per unit in each layer, from start to end.
And now we arrive to your question, which is related to the Back Prop:
While in the Forward Prop we were going from first to last layer, in the *BACK PROP we go in reverse from last layer to first, layer by layer, and inside each layer we go unit by unit (in reality units are processed in parallel, but just bear with me). In other words, we have to go exactly through the very same steps of Forward Prop, but in reverse.
Remember how we had 2 operations per unit in the Forward Prop, Linear and Non-Linear? Well, for the back prop, we have to do the exact same thing: effect 2 operations per unit, but in reverse order: First an operation related to the Non-Linear part, and then another operation of the Linear part.
And this is exactly the place where your question gets answered: You need to do a Gradient calculation for the NonLinear and then a calculation of Gradient for the Linear part.
In the exercise, they are providing the operations related to the “Non-Linear” part and you can explore the code exactly as @paulinpaloalto indicated above. These operations of the “Non-Linear” part are the relu_backward and sigmoid_backward. These operations will basically give you the “Gradient of the cost with respect to the activation”, be it the sigmoid activation or the ReLU activation.
Well, at least that’s how I explained this to myself - I hope this helps.
Juan