DLS4 Week1 Assignment 1 Optional ungraded part: Ex 8 pool_backward. Grader says "wrong values", I'd appreciate some help

I’m working through the optional ungraded bit of assignment “Convolutional Model, Step by Step” and having trouble on the last part, Exercise 8, pool_backward. The assessment cell output tells me “AssertionError: Wrong values for mode max”. (I also get “wrong values” for the average case if I comment out the code for max).

I’ve been through the code carefully to check for trivial errors where I wrote something different from what I intended to write, and I think the code is clear of such things. I’ve checked the code as carefully as I can to see that it’s doing what is intended, according to the instructions, and to my best understanding of the maths. I’ve added various debugging prints and assertions to check my expectations. At least the dimensions of my calculations seem to be right, as I don’t get complaints about invalid dimensions. But there is still that error, which strongly suggests to me that my understanding is wrong.

As this is an optional, ungraded part of the assignment, I hope it should be ok for me to post my code here (let me know if I should remove it). I would appreciate it if someone could point out my mistake!

Below is my code for the backprop (with my debug removed), followed by the grader output with the error.

def pool_backward(dA, cache, mode = "max"):
"""
Implements the backward pass of the pooling layer

Arguments:
dA -- gradient of cost with respect to the output of the pooling layer, same shape as A
cache -- cache output from the forward pass of the pooling layer, contains the layer's input and hparameters 
mode -- the pooling mode you would like to use, defined as a string ("max" or "average")

Returns:
dA_prev -- gradient of cost with respect to the input of the pooling layer, same shape as A_prev
"""


# YOUR CODE STARTS HERE
# Retrieve information from cache (≈1 line)
(A_prev, hparameters) = ...

# Retrieve hyperparameters from "hparameters" (≈2 lines)
stride = ...
f = ...

# Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
m, n_H_prev, n_W_prev, n_C_prev = ...
m, n_H, n_W, n_C = ...

# Initialize dA_prev with zeros (≈1 line)
dA_prev = ...

for i in range(m): # loop over the training examples
    
    # select training example from A_prev (≈1 line)
    a_prev = ...
    
    for h in range(n_H):                   # loop on the vertical axis
        for w in range(n_W):               # loop on the horizontal axis
            for c in range(n_C):           # loop over the channels (depth)
    
                # Find the corners of the current "slice" (≈4 lines)
                vert_start = ...
                vert_end = ...
                horiz_start = ...
                horiz_end = ...
                
                # Compute the backward propagation in both modes.
                if mode == "max":
                    
                    # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                    a_prev_slice = ...

                    # Create the mask from a_prev_slice (≈1 line)
                    mask = ...
                    
                    # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += ...
                    
                # elif mode == "average":
                    
                    # Get the value da from dA (≈1 line)
                    da = ...
                    
                    # Define the shape of the filter as fxf (≈1 line)
                    shape = ...

                    # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += ...

# YOUR CODE ENDS HERE

# Making sure your output shape is correct
assert(dA_prev.shape == A_prev.shape)

return dA_prev

The output from the grader reads

(5, 4, 2, 2)
(5, 5, 3, 2)
mode = max
mean of dA =  0.14571390272918056
dA_prev1[1,1] =  [[ 0.08485462  0.2787552 ]
 [ 6.32305492 -1.94032075]
 [ 1.17975636 -0.53624893]]

mode = average
mean of dA =  0.14571390272918056
dA_prev2[1,1] =  [[0. 0.]
 [0. 0.]
 [0. 0.]]
---------------------------------------------------------------------------AssertionError                            Traceback (most recent call last)
<ipython-input-24-14e1d5abab7e> in <module>     21 assert np.allclose(dA_prev1[1, 1], [[0, 0], 
     22                                     [ 5.05844394, -1.68282702],---> 23                                     [ 0, 0]]), "Wrong values for mode max"
     24 assert np.allclose(dA_prev2[1, 1], [[0.08485462,  0.2787552], 
     25                                     [1.26461098, -0.25749373],

AssertionError: Wrong values for mode max
2 Likes

Hey! You seem to have forgotten to uncomment the elif statement for average pooling case. Fixing that should fix your function.

1 Like

And once you’ve solved your issue, can you please delete the code? :slight_smile:

(Homer Simpson voice) Doh!

I knew it might be something simple and stupid, but still didn’t spot it.

Many thanks @XpRienzo! All tests are now passing.

Will remove the code.

Hi @XpRienzo, I had the following error message in this exercise:

in pool_backward(dA, cache, mode)
17
18 # Retrieve hyperparameters from “hparameters” (≈2 lines)
—> 19 stride, f = hparameters[‘stride’,‘f’]
20 # stride = int(hparameters[“stride”])
21 # f = int(hparameters[“f”])

TypeError: tuple indices must be integers or slices, not tuple

with the following code:

Retrieve hyperparameters from “hparameters” (≈2 lines)

stride = int(hparameters[“stride”])
f = int(hparameters[“f”])

or

in pool_backward(dA, cache, mode)
17
18 # Retrieve hyperparameters from “hparameters” (≈2 lines)
—> 19 stride = hparameters[“stride”]
20 f = hparameters[“f”]

TypeError: tuple indices must be integers or slices, not str

with the following code:

Retrieve hyperparameters from “hparameters” (≈2 lines)

stride = hparameters[“stride”]
f = hparameters[“f”]

or the same error with the code:
stride = hparameters[‘stride’]
20 f = hparameters[‘f’]

However, throughout the code (previous exercises) it was used in this way. I appreciate any help.

I think you might have directly assigned cache to hparameters instead of unpacking cache to A and hparameters. Can you check what you’ve done with it?

I implemented it as follows at the beginning:

#Retrieve information from cache (≈1 line)
(A_prev, hparameters) = dA, cache

that’s why the error occurred. I should only receive the cache variable, which already contains A_prev.

Thank you for your help.

Hey @Bresan, apologies for the late reply. The issue here is the dA entry you are using for the last part. Remember that after max pooling, you just get a single maximum value output for the particular subregion of the input matrix. So the indexing will be different. Maybe referring back to your implementation of pool_forward for max pooling can help.

Thank you very much for your help @XpRienzo , I managed to resolve the error. I already removed the code too. Best Regards.

Glad you were able to solve it. Good luck for the rest of the course!

Hey, I have the following error
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-200-14e1d5abab7e> in <module>
 12 print('dA_prev1[1,1] = ', dA_prev1[1, 1])
 13 print()
---> 14 dA_prev2 = pool_backward(dA, cache, mode = "average")
 15 print("mode = average")
 16 print('mean of dA = ', np.mean(dA))

<ipython-input-199-315d2a3dc54f> in pool_backward(dA, cache, mode)
 67 
 68                         # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
---> 69                         dA_prev[i, h, w, c] += da
 70 
 71     # YOUR CODE STARTS HERE

ValueError: setting an array element with a sequence.

The problem is cause in the “elif” statement for average pooling
I pass the assertion assert(dA_prev.shape == A_prev.shape)
I don’t really understand how to use the variable “shape”

Would appreciate some help. Thank you

I have the same problem. How do you fix it?
My code is here and I am getting below error:

YOUR CODE STARTS HERE

# Retrieve information from cache (≈1 line)
(A_prev, hparameters) = cache

# Retrieve hyperparameters from "hparameters" (≈2 lines)
stride = hparameters["stride"]
f = hparameters["f"]

# Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
m, n_H_prev, n_W_prev, n_C_prev = A_prev.shape
m, n_H, n_W, n_C = dA.shape

# Initialize dA_prev with zeros (≈1 line)
dA_prev = np.zeros((m, n_H_prev, n_W_prev, n_C_prev))

for i in range(m): # loop over the training examples
    
    # select training example from A_prev (≈1 line)
    a_prev = A_prev[i,:,:,:]
    
    for h in range(n_H):                   # loop on the vertical axis
        for w in range(n_W):               # loop on the horizontal axis
            for c in range(n_C):           # loop over the channels (depth)
    
                # Find the corners of the current "slice" (≈4 lines)
                vert_start = h*stride
                vert_end = vert_start + f
                horiz_start = w*stride
                horiz_end = horiz_start + f
                
                # Compute the backward propagation in both modes.
                if mode == "max":
                    
                    # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                    a_prev_slice = a_prev[vert_start:vert_end,horiz_start:horiz_end,c]
                    
                    # Create the mask from a_prev_slice (≈1 line)
                    mask = create_mask_from_window(a_prev_slice)

                    # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                    dA_prev[vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA_prev
                    
                elif mode == "average":
                    
                    # Get the value da from dA (≈1 line)
                    da = dA[i,:,:,:]
                    
                    # Define the shape of the filter as fxf (≈1 line)
                    shape = (f,f)

                    # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += distribute_value(dA_prev, shape)

# YOUR CODE ENDS HERE

(5, 4, 2, 2)
(5, 5, 3, 2)
mode = max
mean of dA = 0.14571390272918056
dA_prev1[1,1] = [[0. 0.]
[0. 0.]
[0. 0.]]

mode = average

mean of dA = 0.14571390272918056
dA_prev2[1,1] = [[0. 0.]
[0. 0.]
[0. 0.]]

AssertionError Traceback (most recent call last)
in
21 assert np.allclose(dA_prev1[1, 1], [[0, 0],
22 [ 5.05844394, -1.68282702],
—> 23 [ 0, 0]]), “Wrong values for mode max”
24 assert np.allclose(dA_prev2[1, 1], [[0.08485462, 0.2787552],
25 [1.26461098, -0.25749373],

AssertionError: Wrong values for mode max

hey Maoumeh, for this section, your supposed to select single training example “i” for dA_prev to input to. Also, it’s mask * dA not mask *dA_prev. Also, this supposed to select individual values of dA, to multiply by Mask.

The dA you use in the max section is the same one you use in the average section

I solved the problem. Thank you for help!

Thanks so much, this was the same problem for me and thank god I saw this comment!

Hello, I’m getting a value error. Can someone help me find my mistakes?

for more clarification: my dA in max and average mode is defined as dA[i, vert_start: vert_end, horiz_start: horiz_end, c].


5, 4, 2, 2)
(5, 5, 3, 2)
mode = max
mean of dA = 0.14571390272918056
dA_prev1[1,1] = [[ 0. 0. ]
[10.11330283 -0.49726956]
[ 0. 0. ]]

mode = average
mean of dA = 0.14571390272918056
dA_prev2[1,1] = [[-0.32345834 0.45074345]
[ 2.52832571 -0.24863478]
[ 1.26416285 -0.12431739]]


AssertionError Traceback (most recent call last)
in
21 assert np.allclose(dA_prev1[1, 1], [[0, 0],
22 [ 5.05844394, -1.68282702],
—> 23 [ 0, 0]]), “Wrong values for mode max”
24 assert np.allclose(dA_prev2[1, 1], [[0.08485462, 0.2787552],
25 [1.26461098, -0.25749373],

AssertionError: Wrong values for mode max

First I got the same error with you. But when I double checked it, found it should be dA[i, h, w, c]

3 Likes

Hi @XpRienzo , I’ve implemented dA_prev[i,vert_start: vert_end, horiz_start: horiz_end, c] += mask * dA[i, h, w, c] for max pooling, but get an error as “non-broadcastable output operand with shape (2,1) doesn’t match the broadcast shape (2,2)”

I tried print out the shape of dA_prev[i,vert_start: vert_end, horiz_start: horiz_end, c], it changes to (2,1) at the last iteration. Appreciate any help or explanation.

1 Like

As you know, this exercise is to back-propagate dA to the previous layer. Here is an overview.

So, dA side is quite simple. Just pick one from dA. That’s all.
The challenge is, of course, dA_prev side, e.g., where we should start as vert_start/horiz_start, where is the end point vert_end/horiz_end, what is window size (filter size), and what is the stride to move a filter.

If you failed to calculate the starting point, then, this types of error occurs.

A program wants to get 2x2 slice, but, there is no 2x2 left. So, Python returns 2x1.
In this case, this particular code looks correct. But, the result is not what we expect.

Of course, the above is one of possibilities. But, first thing to do is to check the window position for dA_prev.

Hope this helps.

3 Likes

@anon57530071 , Hi, thanks a lot. I’ve go through the dimension and found the problem came from the wrong initial dimension of dA_prev. Now it is solved. Cheers.

1 Like