Hi,
I don’t seem to get my head around this one.
dA and dA_prev have different shapes. The comments suggest to run the loops in range (n_H, n_W, and n_C)
If I do that I get a problem when I want to update the last slice:
dA_prev[i, vert_start: vert_end, horiz_start: horiz_end, c]
It has shape(2,1) and thus da[…] * mask cannot be broadcast (because it has shape(2,2)
Do you have any hints what might be the correct way to solve this? I think I overlook something crucial here.
I’d really love to understand how this works.
Bear in mind that you’re not dealing with the whole “image” there, just with the current filter “patch”. The dimensions you show on the LHS look correct. And the difference between the vert_start
and vert_end
should be the filter size, right? So why is that not the same on the RHS? Are you sure you correctly included the stride in the vert_start
and horiz_start
calculations? That’s the most common error. If you got that correct in the forward case, it’s the same here, right?
Thank you very much for your answer @paulinpaloalto. I checked again my calculation of vert_start and vert_end and horiz_start. horiz_end. It is correct as far as I see (the same as in the forward case). The issue here is that dA has the shape (5,4,2,2) and the filter size is 2 and stride is 1. In that case the calculation of start and endpoints does not work, as expected as when h becomes 1 horiz_start would be 1 and horiz_end would be 3 in which case the filter patch would only “cover” one column and have nothing to overlap for the secon column. I found no hints on how to hanle that case. Do I need to pad dA ?
The shape of the RHS of the assignment is determined by two different means depending on whether it is the “max” or “average” case, right? In the “max” case it is determined by a_slice_prev
which gets passed to create_mask_from_window
. In the average case, it is defined as the shape of the filter and then passed to distribute_value
, right? In which case are you seeing the failure?
Thanks again for your response!
I’m seeing the failure in the max case.
I have this:
{moderator edit - solution code removed}
for h=0 and w=1 I get the error message:
non-broadcastable output operand with shape (2,1) doesn’t match the broadcast shape (2,2)
because in this case horiz_start is 1 horiz_end is 3 but the shape of dA_prev is (5,4,2,2) the slice will have shape (2,1) …
Is there something wrong with my h and w values or do I need to handle the case when the filter window goes beyond the bounds of the data ?
I think the problem is simple: when you “slice” a_prev
you are omitting the “samples” dimension, but maybe you did that as a precursor to this logic. What is the shape of the resulting a_prev_slice
? Here’s what I get in code that works:
a_slice_prev.shape = (2, 2)
Or maybe there is something wrong with your create_mask_from_window
logic. Here’s the shape of the mask that I get:
mask.shape = (2, 2)
Actually in my previous reply, I had only looked at your code, not the rest of your message. I think the point is the w = 1 is not a valid starting point, right? Precisely because it “steps” you off the end of the array. All the positions should allow 2 x 2 shape, because that’s the filter size. So maybe the problem is that your loop limits are wrong. E.g. maybe you have switched h and w. h is height, not horizontal, right?
That’s what puzzles me. I think I have everything correct here. But maybe I’m just blind now …
{moderator edit - solution code removed}
Looks correct, doesn’t it?
Since stride is 1 and n_W is 2i n this example, in the for loop w will take on the values 0 and then 1. So the stride is smaller than the filter size here, which is a case I don’t handle in the code yet. But I have also now idea how it should be handled.
I agree your logic looks correct. You should not need to do anything more with stride than what you did. I am away from my computer for 3 more hours, so I can’t really compare code. Maybe we are back to the theory about the mask computation routine …
Sorry, I wasn’t thinking clearly in my previous responses. It is the same issue in backward prop that it is in forward prop: the input space and the output space are different shapes. Here (as in forward prop) you are looping over the output space, but you are applying the gradients to the input space where the shapes are different. Here are the shapes:
A_prev.shape = (5, 5, 3, 2)
dA.shape = (5, 4, 2, 2)
The w dimension of A_prev
is 3. So w = 1 in the output space is legit and it maps to w_Prev
= 1 length 2, so the last index is 2 and it fits and does not run off the end of anything.
So I think we’re down to your mask
value being the wrong shape for some reason.
1 Like
Great thank you. Your last comment about hte shape of the mask pushed me in the right direction. In the end I initialized dA_prev to the wrong shape. I used dA.shape instead of A_prev.shape. As I assumed I was blind on that spot, for some reason now all works as expected. Thanks again for your support!