C4W1 - Optional assignment 1 - back propagation

Are there any hints or discussion related to the Exercise 5 - conv_backward() optional programming section in the first assignment of Course 4 Week 1? I couldn’t find any. When I tried to implement the equations I get dimension mismatch errors.

Has anyone implemented dA, dW and db and can please share any hints on vector multiplication steps?

Thanks

They do actually write out all the formulas for you in the instructions. Maybe the one thing that could be a stumbling block would be to make sure you understand the notational conventions that Professor Ng uses for matrix multiplication. There are two types: dot product style and “elementwise”. Those are very different operations. If he uses * as the operator, then it means “elementwise”. If he means full matrix multiply, he just writes the operands adjacent with no explicit operator. Here’s a thread which discusses this point in a bit more detail.

If that’s not enough to help, then please show us the exception trace that you are getting.

Thanks for the reply. This is the code I currently have not very clear - how to implement those derivatives.

{moderator edit - solution code removed}

Output:

ValueError Traceback (most recent call last)
in
10
11 # Test conv_backward
—> 12 dA, dW, db = conv_backward(Z, cache_conv)
13
14 print(“dA_mean =”, np.mean(dA))

in conv_backward(dZ, cache)
111 # Update gradients for the window and the filter’s parameters using the code formulas given above
112 da_prev_pad[h,w, c] += np.dot(W[i,h,w,c], dZ[i, h, w, c])
→ 113 dW[i,:,:,c] += np.dot(a_slice[:,:,c], dZ[i, :,:, c])
114 db[i, h, w,c] += dZ[i, h, w, c]
115

<array_function internals> in dot(*args, **kwargs)

ValueError: shapes (2,2) and (4,4) not aligned: 2 (dim 1) != 4 (dim 0)

Please note that we’re not supposed to share solution code in a public way. I’ll reply here and then edit your post to remove the code. If we need to continue this code examination, we should switch to a private DM thread.

There are a number of mistakes:

You initialize the shapes of dA_prev, dW and db to all be the same, but those objects are not the same shapes. For example, the dimensions of dW and W are:

f x f x nC_{in} x nC_{out}

Where f is the filter size, nC_{in} is the number of channels in A_prev and nC_{out} is the number of output channels at the current layer.

It looks like you missed my point about the type of multiply that is used there. You have two invocations of np.dot in your logic, but there are literally no dot product multiplies in this logic. All the multiplies are “elementwise”. Did you read that thread that I linked above?

When you update da_prev_pad, you are slicing it incorrectly: the slicing should be the same as the slice you took of a_prev_pad a couple of lines above.

When you update dW, you slice dZ differently than you did the line before and on the line after. Why is that different?

Note when you slice dW and db as you are updating them, you are using i as the first index, but the W and b values do not have a “samples” dimension (see the dimensions of W that I gave above).

So the overall point here is that you really need to study the instructions that they are giving you in that section again carefully with the “*” notation in mind as well.