Conv_backward() problem

I get this error:

dA_mean = -0.9370722652151867
dW_mean = -0.8767934077414234
db_mean = -1.2483155349054407
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-16-3575597a0654> in <module>
     22 assert dW.shape == (2, 2, 3, 8), f"Wrong shape for dW {dW.shape} != (2, 2, 3, 8)"
     23 assert db.shape == (1, 1, 1, 8), f"Wrong shape for db {db.shape} != (1, 1, 1, 8)"
---> 24 assert np.isclose(np.mean(dA), 1.4524377), "Wrong values for dA"
     25 assert np.isclose(np.mean(dW), 1.7269914), "Wrong values for dW"
     26 assert np.isclose(np.mean(db), 7.8392325), "Wrong values for db"

AssertionError: Wrong values for dA

My code is the following (since it is not a graded exercise, it should be fine to post it here):

def conv_backward(dZ, cache):
    """
    Implement the backward propagation for a convolution function
    
    Arguments:
    dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward(), output of conv_forward()
    
    Returns:
    dA_prev -- gradient of the cost with respect to the input of the conv layer (A_prev),
               numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    dW -- gradient of the cost with respect to the weights of the conv layer (W)
          numpy array of shape (f, f, n_C_prev, n_C)
    db -- gradient of the cost with respect to the biases of the conv layer (b)
          numpy array of shape (1, 1, 1, n_C)
    """    
    
    # YOUR CODE STARTS HERE
    
    # Retrieve information from "cache"
    (A_prev, W, b, hparameters) = cache
    # Retrieve dimensions from A_prev's shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape
    
    # Retrieve information from "hparameters"
    stride = hparameters["stride"]
    pad = hparameters["pad"]
    
    # Retrieve dimensions from dZ's shape
    (m, n_H, n_W, n_C) = dZ.shape
    
    # Initialize dA_prev, dW, db with the correct shapes
    dA_prev = np.zeros(A_prev.shape)                          
    dW = np.zeros(W.shape)  
    db = np.zeros((1, 1, 1, n_C))  
    
    # Pad A_prev and dA_prev
    A_prev_pad = zero_pad(A_prev, pad)
    dA_prev_pad = zero_pad(dA_prev, pad)
    
    for i in range(m):                       # loop over the training examples
        
        # select ith training example from A_prev_pad and dA_prev_pad
        a_prev_pad = A_prev_pad[i]
        da_prev_pad = dA_prev_pad[i]
        
        for h in range(n_H):                   # loop over vertical axis of the output volume
            for w in range(n_W):               # loop over horizontal axis of the output volume
                for c in range(n_C):           # loop over the channels of the output volume
                    
                    # Find the corners of the current "slice"
                    vert_start = h*stride
                    vert_end = h*stride + f
                    horiz_start = w*stride
                    horiz_end = w*stride + f

                    # Use the corners to define the slice from a_prev_pad
                    a_slice = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]

                    # Update gradients for the window and the filter's parameters using the code formulas given above
                    da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
                    dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
                    db[:,:,:,c] += dZ[i, h, w, c]
                    
        # Set the ith training example's dA_prev to the unpadded da_prev_pad (Hint: use X[pad:-pad, pad:-pad, :])
        dA_prev[i, :, :, :] = da_prev_pad[pad:-pad, pad:-pad, :]
    
    # YOUR CODE ENDS HERE
    
    # Making sure your output shape is correct
    assert(dA_prev.shape == (m, n_H_prev, n_W_prev, n_C_prev))
    
    return dA_prev, dW, db

What am I doing wrong?

Thanks for your reply @ai_curious

I have edited the code to include the ‘stride’:

                    vert_start = h*stride
                    vert_end = h*stride + f
                    horiz_start = w*stride
                    horiz_end = w*stride + f

The results for dA, dW and db are still wrong though, any other ideas as to what could be done to fix this?

1 Like

Hey, I deleted my response because I wasn’t completely confident it was the problem and don’t have access to the code anymore (subscription expired) to do experiments. Since the computations depend on values originally determined in conv_forward() (via cache_conv), you should confirm those are exactly correct before further debugging back prop. Then really look at the size and the slicing that is driving it. Hope this helps.

2 Likes

Thanks for your reply @ai_curious

This is the full code theoretically needed to be able to run ‘conv_backward()’ and get their same results. Assume all functions are correct as I pass all tests for all functions except for the ‘conv:backward()’ funciton. I’ll keep trying things but feel very stuck.

{moderator edit - solution code removed}

np.random.seed(1)
A_prev = np.random.randn(10, 4, 4, 3)
W = np.random.randn(2, 2, 3, 8)
b = np.random.randn(1, 1, 1, 8)
hparameters = {"pad" : 2,
               "stride": 2}
Z, cache_conv = conv_forward(A_prev, W, b, hparameters)

# Test conv_backward
dA, dW, db = conv_backward(Z, cache_conv)

print("dA_mean =", np.mean(dA))
print("dW_mean =", np.mean(dW))
print("db_mean =", np.mean(db))

assert type(dA) == np.ndarray, "Output must be a np.ndarray"
assert type(dW) == np.ndarray, "Output must be a np.ndarray"
assert type(db) == np.ndarray, "Output must be a np.ndarray"
assert dA.shape == (10, 4, 4, 3), f"Wrong shape for dA  {dA.shape} != (10, 4, 4, 3)"
assert dW.shape == (2, 2, 3, 8), f"Wrong shape for dW {dW.shape} != (2, 2, 3, 8)"
assert db.shape == (1, 1, 1, 8), f"Wrong shape for db {db.shape} != (1, 1, 1, 8)"
assert np.isclose(np.mean(dA), 1.4524377), "Wrong values for dA"
assert np.isclose(np.mean(dW), 1.7269914), "Wrong values for dW"
assert np.isclose(np.mean(db), 7.8392325), "Wrong values for db"

print("\033[92m All tests passed.")

I have also noticed something that may be an error on their part. (Do tell me if what I’m saying is nonsense).

When the people who wrote the code call the conv_backward() function, they do so using Z as the first argument, as shown below, which they obtained from conv_forward(), but shouldn’t they be using dZ as the first argument to input into conv_backward() instead?

Z, cache_conv = conv_forward(A_prev, W, b, hparameters)

# Test conv_backward
dA, dW, db = conv_backward(Z, cache_conv)

In other words, aren’t they missing a step in which they calculate dZ and then put dZ into conv_backward(dZ, cache_conv)?

I’ve also noticed a way to simplify the way to solve this.

Since my db value is wrong and db only depends on the equation db[:,:,:,c] += dZ[i, h, w, c] we can infer that either:

a) the equation db[:,:,:,c] += dZ[i, h, w, c] is wrong
or
b) dZ is wrong
or
c) they’re both wrong

If I manage to find out which is wrong I will fix the issue, but how could I find this is out…

Just did a quick scan through the email, will extract and try to run later, but this looks wrong to me. Shouldn’t stride be used for vert_start and horiz_start ?

Also,

db is initialized to zeros, so all the values come from dZ. If db is wrong, it means dZ is wrong. If dZ is wrong, it means conv_forward() is wrong. Are you sure you pass all the internal unit tests with the code the way it is right now? Again, I haven’t run it, but it looks like it should fail the test for isclose(np.mean(dA))

But I am using stride, right here:

                    # Find the corners of the current "slice"
                    vert_start = h*stride
                    vert_end = h*stride + f
                    horiz_start = w*stride
                    horiz_end = w*stride + f

If you’re asking if my conv_forward() function passes all tests correctly, it does, here is proof:


If you’re asking if I fail the isclose(np.mean(dA)) test, yes I do, it was the first thing I mentioned in this thread/the reason I’m stuck:

dA_mean = -0.9370722652151867
dW_mean = -0.8767934077414234
db_mean = -1.2483155349054407
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-16-3575597a0654> in <module>
     22 assert dW.shape == (2, 2, 3, 8), f"Wrong shape for dW {dW.shape} != (2, 2, 3, 8)"
     23 assert db.shape == (1, 1, 1, 8), f"Wrong shape for db {db.shape} != (1, 1, 1, 8)"
---> 24 assert np.isclose(np.mean(dA), 1.4524377), "Wrong values for dA"
     25 assert np.isclose(np.mean(dW), 1.7269914), "Wrong values for dW"
     26 assert np.isclose(np.mean(db), 7.8392325), "Wrong values for db"

AssertionError: Wrong values for dA

not in conv_forward. that code has

a_prev_pad[h:h+f, w:w+f, :]

ps: once this is resolved you should probably delete the solution stuff or a mentor/moderator will come do it for you

1 Like

That was the solution, thanks a lot for your time @ai_curious !!

1 Like

Weird that it could pass unit tests like that. I would have thought a shape or mean would have failed previously. Glad its sorted.

The problem is that your conv_forward code is wrong, as ai_curious has pointed out. Why would you use the stride in conv_backward and not in conv_forward? The test cases do not catch that error. A bug has been filed about that for months, but it has not been fixed. It is always a bad idea to assume that the unit tests catch everything.

I had a very similar problem, thanks to this thread I managed to fix it! However this question still remains, did you get the answer to why we use Z instead of dZ?

2 Likes

Not sure I completely understand the question/confusion. Are you asking about the parameter passed in to conv_backward() ? Maybe review the video Forward and Backward Propagation from Week 4 of Course 1, at about the 1:00 mark you see the notation for the output of forward propagation, and just after the 2:00 mark the notation for backward prop. Notice that the calculation of dZ is part of backward prop, not something that preexists it. You use the output of forward prop, Z, then calculate dZ and the other needed derivatives. There is a similar discussion at around 2:40 of Backpropagation intuition

Hope this helps.

I simply assumed that yes, their code (below) is wrong - they should be calculating dZ (in a standard neural network they would calculate dZ using the equation dZ = activation(Z) - y, in a CNN, I’m not sure what the equation would be) and then putting dZ into conv_backward instead of Z - but since they’re only using Z to test conv_backward it doesn’t really matter what you use as long as it has the same dimensions as dZ (and, unsurprisingly, Z always has the exact same dimensions as dZ).

In other words, I think they used Z instead of dZ to test conv_backward out of laziness simply because it is unnecessary to use the right value to test whether yourconv_backward works properly or not

Z, cache_conv = conv_forward(A_prev, W, b, hparameters)

# Test conv_backward
dA, dW, db = conv_backward(Z, cache_conv)

I may be wrong, but this was a good enough explanation for me to move on

1 Like

What he means is, when conv_backward is defined, its arguments are dZ and cache:

def conv_backward(dZ, cache):
    """
    Implement the backward propagation for a convolution function
    
    Arguments:
    dZ -- gradient of the cost with respect to the output of the conv layer (Z), numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward(), output of conv_forward()

But when it is tested, it is tested with Z instead of dZ (last line below):

# We'll run conv_forward to initialize the 'Z' and 'cache_conv",
# which we'll use to test the conv_backward function
np.random.seed(1)
A_prev = np.random.randn(10, 4, 4, 3)
W = np.random.randn(2, 2, 3, 8)
b = np.random.randn(1, 1, 1, 8)
hparameters = {"pad" : 2,
               "stride": 2}
Z, cache_conv = conv_forward(A_prev, W, b, hparameters)

# Test conv_backward
dA, dW, db = conv_backward(Z, cache_conv)

Which, by the definition of conv_backward just pasted above, must be wrong

1 Like

Sorry I didn’t notice the discrepancy in the notebook code before. Clearly Z and dZ are not the same, though you correctly observe they have the same shape. And if the expected output of the other derivatives was computed this way, you can still match them. But pedagogically it is wrong. My preference would be to see it computed inside conv_backward() and not in an intermediate step between forward prop and backward.

1 Like

I had a very similar problem. My problem is the initialization part.

======= Those three lines below are wrong ============
dA_prev = np.random.randn(m, n_H_prev, n_W_prev, n_C_prev)
dW = np.random.randn(f, f, n_C_prev, n_C)
db = np.random.randn(1, 1, 1, n_C)

======= Those three lines below are correct ============
dA_prev = np.zeros([m, n_H_prev, n_W_prev, n_C_prev])
dW = np.zeros([f, f, n_C_prev, n_C])
db = np.zeros([1, 1, 1, n_C])

When the instruction states “Initialize dA_prev, dW, db with the correct shapes”. it means to put them into zeros. In the operations in the conv_backward, those three variables are updated many times with additions. Their initial value should be zero, instead of random values.

1 Like

Thanks a lot! I fixed error in conv_forward, then it corrected also conv_backword

Thank you! I had the dimensions wrong, this helped me to fix it :slight_smile: