Course 4, Week 1, optional exercise 8, pool_backward

Hey, I am struggling with the optional assignment. It seems I have wrong output for the “max” and compiling error for the “average”

def pool_backward(dA, cache, mode = "max"):

# Retrieve information from cache (≈1 line)
(A_prev, hparameters) = ...

# Retrieve hyperparameters from "hparameters" (≈2 lines)
stride = ...
f = ...

# Retrieve dimensions from A_prev's shape and dA's shape (≈2 lines)
m, n_H_prev, n_W_prev, n_C_prev = ...
print("A_prev.shape: ", A_prev.shape)
m, n_H, n_W, n_C = ...


# Initialize dA_prev with zeros (≈1 line)
dA_prev = ...
print("dA_prev.shape: ", dA_prev.shape)

for i in range(...): # loop over the training examples
    
    # select training example from A_prev (≈1 line)
    a_prev = 
    
    for h in range(n_H):                   # loop on the vertical axis
        for w in range(n_W):               # loop on the horizontal axis
            for c in range(n_C):           # loop over the channels (depth)
    
                # Find the corners of the current "slice" (≈4 lines)
                vert_start = ...
                vert_end = ...
                horiz_start = ...
                horiz_end = ...
                
                # Compute the backward propagation in both modes.
                if mode == "max":
                    
                    # Use the corners and "c" to define the current slice from a_prev (≈1 line)
                    a_prev_slice = ...
                    
                    # Create the mask from a_prev_slice (≈1 line)
                    mask = ...

                    # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
                    dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += ...
                    #print("dA_prev.shape: ", dA_prev.shape)
                    
                elif mode == "average":
                    
                    # Get the value da from dA (≈1 line)
                    da = ...
                    print("dA[i, h, w, c]: ", dA[i, h, w, c])
                    print("da: ", da)
                    
                    # Define the shape of the filter as fxf (≈1 line)
                    shape = ...

                    # Distribute it to get the correct slice of dA_prev. i.e. Add the distributed value of da. (≈1 line)
                    dA_prev[i, h, w, c] += ...
                    
# YOUR CODE STARTS HERE


# YOUR CODE ENDS HERE

# Making sure your output shape is correct
assert(dA_prev.shape == A_prev.shape)

return dA_prev

my output is

mode = max
mean of dA = 0.14571390272918056
dA_prev1[1,1] = [[ 0. 0. ]
[10.11330283 -0.49726956]
[ 0. 0. ]]

A_prev.shape: (5, 5, 3, 2)
dA_prev.shape: (5, 5, 3, 2)

I would appreciate some help! Thanks!

3 Likes

I think the Honor Code does not allow you to post your code on the Forums. Please remove it.

You can post questions, but not your code and ask someone to fix it for you.

ok, let me rephrase it. I have successfully passed the tests in the previous exercises of the assignment, so I guess distribute_value() and create_mask_from_window() work properly.
However, I get wrong output in the “max” case.
Moreover, the “average” case gives me the error ValueError: setting an array element with a sequence. which I guess means that there is a problems with the dimensions. I also do not understand how to use the variable “shape” in this average case, I guess the argument “shape” to be passed in distribute_value has to be defined previously, or not?

Hello, I would appreciate any help. Since it is an optional exercise, I am just curious to find out the problem here.

Hello! I have got the same output for mode == max, after giving it some thought, I realized what the problem was.

When we assign the dA_prev in the max mode we need to understand how backprop works here. In forward prop we are just multiplying our segment by the maximum mask. Going backwards we are doing the opposite - we are getting the segment of dA_prev from the corresponding value of dA.

So we need to multiply mask not by the segment of dA, but by the one value.
So dA_prev we will assign with mask multiplied by the element of dA (that we are iterating over).

Hope that helps someone! Have a good day!

20 Likes

@Ihme11 Thank you very much for this response. It really makes sense. I corrected it and got the right output for max mode.
Do you have any idea about the average mode?

Thanks a lot!

Glad that helped you! It’s hard to say what is the error from, since the code was replaced, but I can give my interpretation of the backprop for average.

First, we set da to the value of dA we are iterating over (we do this to feed it into our distribute_value function).

Second, we set shape = (f,f).

Third, we fill our segment of matrix dA with the distibuted value of our element da. So we add to our segment of dA_prev[…] the matrix that we create with our distributed_value function (feeding it value that we want to distribute and the shape of matrix we want it to be).

This backprop details was not explained in the lectures so I hope that I got this right, and made it more or less clear!

4 Likes

@Ihme11 Thank you again so much!
I was stuck. I had been using the distributed value to calculate da, which, after your explanation realized was wrong. Now I calculate the value da from dA just as a slice of dA, as I was supposed to do. Then I use the distributed value function to calculate dA_prev

Now all tests passed!
Right, the back propagation is not clear to me since no details are given in the lecture, so your insight really help at least to complete this assignment!

Thanks again so much! Have a nice one!
Ellie

Thank you very much, got the same issue and resolved by carefully reading your comments

Thank you a lot for the suggestion and clarification about the topic. I got the same issue and after reading your explanation several times ı corrected the issue.

Sorry but my problem is slightly different:

dA_prev[i, vert_start:vert_end, horiz_start:horiz_end, c] += mask * “appropriate slice of dA, (a scalar)”

and I’m getting:
ValueError: non-broadcastable output operand with shape (2,1) doesn’t match the broadcast shape (2,2)

So I’m starting to suspect something else in the code is broken despite passing the tests? Can anyone help me troubleshoot this?

Found my mistake (horrible typo). If it helps anyone. Be sure to not confuse A_prev with A when initialising

Hello @MSvensson

It’s good that you have found it. This type of bug is a difficult one to spot.

Cheers,
Raymond

Hi there!

I am still confused. I have tried this:
dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += dA[i, vert_start: vert_end, horiz_start: horiz_end, c] * mask

and it throws this error, what could be the issue?


ValueError Traceback (most recent call last)
in
7 dA = np.random.randn(5, 4, 2, 2)
8
----> 9 dA_prev1 = pool_backward(dA, cache, mode = “max”)
10 print(“mode = max”)
11 print('mean of dA = ', np.mean(dA))

in pool_backward(dA, cache, mode)
52 #print(mask.shape)
53 # Set dA_prev to be dA_prev + (the mask multiplied by the correct entry of dA) (≈1 line)
—> 54 dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c] += dA[i, vert_start: vert_end, horiz_start: horiz_end, c] * mask
55
56 elif mode == “average”:

ValueError: non-broadcastable output operand with shape (2,1) doesn’t match the broadcast shape (2,2)

I’ve tried printing the shape of mask and at some point it becomes (2,1) and (2,0), is this normal? I have derived vert_end and horiz_end in the same way as the other exercises, with no apparent issue