DLS4 week 1 conv_backward

I’m receiving this error. I check and it seems like a_slice should be 3 dimensional array. but dz and dw are 4 dimensional arrays. so I can’t put them together. a_slice comes from a prev pad which comes from a slice of A prev pad so It seems it should be 3 dimensional?

dw
(2, 2, 3, 8)
aprevpad
(8, 8, 3)
aslice
(2, 2, 3)
Dz
(10, 4, 4, 8)


ValueError Traceback (most recent call last)
in
10
11 # Test conv_backward
—> 12 dA, dW, db = conv_backward(Z, cache_conv)
13
14 print(“dA_mean =”, np.mean(dA))

in conv_backward(dZ, cache)
90 da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
91 # dW[:,:,:,c] += None
—> 92 dW[:,:,:c] += a_slice * dZ[i, h, w, c]
93 # db[:,:,:,c] += None
94 db[:,:,:,c] += dZ[i, h, w, c]

ValueError: operands could not be broadcast together with shapes (2,2,0,8) (2,2,3) (2,2,0,8)

There are a number of things wrong there. What is up with the shape 2 x 2 x 0 x 8? Why is the third dimension empty? And note that the loops are over one value of c at a time, right? Think about how forward propagation works and then note that backward propagation is exactly the mirror image of it. In forward prop, at each point in the output space (which is what the loops cover), we project from a shape of f x f x nC_{in} to one element of the output space. So in backward propagation, the projection works in the opposite direction: from one position of the output to that same shape of the input I listed above.

So the third dimension is ‘w’. Maybe its an indentation problem again? Although I’ve been over these for loops a couple of times making sure the indentations were right. Yes so they loop over one value of c at a time and project that out as a 3d array of fxfxncin. How does that factor into this? again, the loops, specifically for c?

I added some print statements to my conv_backward logic and here’s what I see when I run that test cell:

stride 2 pad 2
New dimensions = 4 by 4
Shape Z = (10, 4, 4, 8)
Shape A_prev_pad = (10, 8, 8, 3)
Z[0,0,0,0] = 0.3724568515114425
Z[9,3,3,7] = -0.06741002494685439
W.shape (2, 2, 3, 8)
dA_prev_pad.shape (10, 8, 8, 3)
a_slice.shape (2, 2, 3)
dW[:,:,:,c].shape = (2, 2, 3)
dA_mean = 1.4524377775388075
dW_mean = 1.7269914583139097
db_mean = 7.839232564616838
 All tests passed.

Remember that we are talking about dW there. What is the shape of W? The third dimension is the input channels, right? There is no “samples” dimension in W and dW.

Okay. my a_prev_pad 8,8,3.Your A_prev_pad is 10,8,8,3 a_prev_pad comes from slicing into A_prev_pad at the ith training example. So going from 4d to 3d here makes sense since we’re slicing into the ith training example m. Right?

Yes, that sounds right.

both dW and dZ, where the 2x2x0x8 comes from are dW[:,:,:c] += a_slice * dZ[i, h, w, c]…here dZ is obviously w in the third dimension. dW on the other hand is fxfx ncprev x nc

dW is all zeros at that point. so that third space being a zero must come from dZ at the w index

I think you’ve got the same confusion here that you had earlier. A_prev[0] is not the same thing as A_prev.shape[0], right? We’ve been here before. How does that thought process apply to the current situation. :nerd_face:

So I realized I had dZ listed as dZ[i,h,w,c] from the equation but its actual dimensions are dZ[m,nh,nw,nc] so i fixed those. And now my error is:

IndexError Traceback (most recent call last)
in
10
11 # Test conv_backward
—> 12 dA, dW, db = conv_backward(Z, cache_conv)
13
14 print(“dA_mean =”, np.mean(dA))

in conv_backward(dZ, cache)
88 # da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += None
89
—> 90 da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[m, n_H, n_W, n_C]
91 # dW[:,:,:,c] += None
92 dW[:,:,:c] += a_slice * dZ[m, n_H, n_W, n_C]

IndexError: index 10 is out of bounds for axis 0 with size 10

Is what I’ve done below a fix to what youre referencing here or do I still have the problem you’re listing here AND now the problem I’ve made for myself below.

Think about what that means. So you are using the same index values for dZ in every iteration of the loop? How does that make any sense? It’s just like before: you’re just ricocheting around. This is not simple material and it requires some work to really understand what is going on. Take a few deep breaths and slow down and give yourself time to think.

Okay. You’re right. i changed dZ back to i,h,w,c so that it would be iterated through the loop

okay so my problem dW is a 0 at that position not as in dW[3] = 0 but as dW.shape[3] = 0 which means my 3rd index is size 0, not that the value at dW[3] is 0. although if the size is 0 i believe the value should also be 0?

Not sure I understand your point, but I would refer back to the first couple of posts on this thread: it does not make sense for any of the dimensions of W or dW to be zero, right? We know what the shape of W is and the shape of dW is by definition the same.

Here’s the relevant line from my printouts above:

W.shape (2, 2, 3, 8)

So that is f x f x nC_{in} x nC_{out}, right?

Yes, and ncin is in W and W comes from the cache. Do I have a broken cache, have I broken the cache!?!?

You tell me: did you modify the cache? I’m guessing not. So I think the more likely possibility is simply that you are misinterpreting the contents of the cache.

It’s probably the same continuing theme of mixing up the contents of an array with its shape. Repeat after me: what’s the difference between A_prev[0] and A_prev.shape[0]?

I’m tempted to just repeat after you there haha. I get it, A_prev[0] is the value at the first position of A_prev whereas A_prev.shape[0] would be the first entry of the shape.

like if A_prev was a 4x3x3 array then A_prev.shape = 4x4x3 and A.prev.shape[0] = 4 whereas A_prev[0] would be the first value in the first dimension of A_prev. Right?

W.shape = 2,2,3,8
W.shape[0] = 2
W[0] would be the first value in the first index 2 which represents the dimensions of the filter.