W3E7 deep copy, and gradient equation

Yes, the comment is misleading: you should copy all the parameters, but it really only matters if you use the “in place” operation for doing the updates as in:

W1 -= .... formula for updating W1 ....

If you use the code:

W1 = W1 - ... update term ...

then copying does not matter. This is all explained on this thread.

But the way the comment is written is also a bit misleading. The point of “deepcopy” as opposed to just “plain copy” as in:

W1 = W1.copy()

is that you can do the deepcopy on the whole parameters dictionary in one shot like this:

parameters = copy.deepcopy(parameters)

and it solves the whole problem by duplicating all the memory objects, as explained on that thread I linked above.

Note that the autograder has actually been fixed here not to require copying even with the “in-place” version of the code, but in Week 4 we do need to do this.