W3E7 deep copy, and gradient equation


Context: W3E7 deep copy, and gradient equation

→ 1. Is there a reason that b1, and b2 are not mentioned for copy.ddepcopy.
Would we also need to perform deep copy for b1 and b2?

“# Retrieve a copy of each parameter from the dictionary “parameters”. Use copy.deepcopy(…) for W1 and W2
#(≈ 4 lines of code)”

→ 2. Would below be a correct equation?
Would there be any role for parameter (theta) in the function. ( I don’t believe so.)

W1 = W1 - (learning_rate*dW1)

Please assist.

I believe you should be able to use deepcopy for b1 and b2 as well (it should work both ways). Based on the note, this is only needed because of the way the autograder works.

The theta variable is just another name for the weights/parameters. In this particular example, theta is the same as W (this is just the name used in the course).

Yes, the comment is misleading: you should copy all the parameters, but it really only matters if you use the “in place” operation for doing the updates as in:

W1 -= .... formula for updating W1 ....

If you use the code:

W1 = W1 - ... update term ...

then copying does not matter. This is all explained on this thread.

But the way the comment is written is also a bit misleading. The point of “deepcopy” as opposed to just “plain copy” as in:

W1 = W1.copy()

is that you can do the deepcopy on the whole parameters dictionary in one shot like this:

parameters = copy.deepcopy(parameters)

and it solves the whole problem by duplicating all the memory objects, as explained on that thread I linked above.

Note that the autograder has actually been fixed here not to require copying even with the “in-place” version of the code, but in Week 4 we do need to do this.