Here I would like to know how I can get dZ which is the input of conv_backward. There is no specific steps or calculation in programming exercise for dZ. I only find this sentence: dZ: the gradient of the cost with respect to the output of the conv layer Z at the hth row and wth column (corresponding to the dot product taken at the ith stride left and jth stride down). I want more specific calculations or explanations. Thanks a lot.

dZ is the âZâ output of conv_forward.

So you mean for def conv_backward(dZ, cache), here dZ can be obtained by

def conv_forward(A_prev, W, b, hparameters): âŚ return Z, cache

and dZ=Z?

Yes, that appears to be how the assignment is designed.

The reason itâs confusing is that conv_forward() never really gives a definition for âZâ.

Thanks so much! Got it

Btw, I would like to know so if in the conv layer dZ=Z, how the difference between the fully connected layer results and Y_train back propagates to the conv layer and optimize weights and bias? It seems that the conv_forward(A_prev, W, b, hparameters) and conv_backward(dZ, cache) becomes a closed loop, and W and b can update without parameters in the max pooling or fully connected layers if dZ=Z.

I think this is a good question. The answer is dZ is not equal to Z.

What is the starting point of a back-propagation ? Itâs a loss function to evaluate differences between the expected values (in supervised learning) and calculated values.

And, if you think about the forward propagation step, typically an activation function â\sigmaâ is applied to Z. Then, we get âaâ, and pass to the next layer.

Starting from the cost function, at first, we calculate \frac{\partial L}{\partial a}. Then, calculate \frac{\partial L}{\partial z} and so on. So, dZ comes from an upper layer.

The important thing is, when dZ is given from the upper layer (or even from an activation function of a same neuron), you need to calculate dW and db which are âweightsâ for this convolution layer. âweightsâ update for a convolution layer is actually âfilterâ update. So, it is quite important for you to update those filters based on losses back-propagated from higher layers.

For this exercise, we do not have a loss function. We also do not have any further definition of the network.

As the shape for dZ is very complex which depends on the total network structure till this layer, what we can do is to reuse âforward propagationâ function to get Z, and reuse Z as an example of dZ, since the shape is same.

So, dZ is not equal to Z. We just borrow Z as an initial value for dZ to calculate dW and db.

It would be a nice enhancement if this was explained in the notebook.

Thanks so much! Thatâs how I understand it. I just stuck in dZ in the conv layer. As you said, it depends on the total network structure till the conv layer. In the artificial neural network, we can simply use dZ[i]=(W[i+1].T) * dZ[i+1] .* gâ(Z[i]) to derive dZ, and the dZ[i+1] contains the information from the upper layer. If we just reuse Z in the conv layer as an example of dZ, the information from the upper layer (like fully connected layer) cannot back-propagate to the conv layer. Though in Python we can use tensorflow to substitute the complex calculation inside this part, but I still want to figure out it manually. Do you have any resource or link to illustrate the detail CNN dZ derivation? Thank you so much!