In week2 lab assignment, section 4.6 hint, it is explained that " * Use copy.deepcopy(...)
when copying lists or dictionaries that are passed as parameters to functions. It avoids input parameters being modified within the function. In some scenarios, this could be inefficient, but it is required for grading purposes." – I think this explanation is misleading. In that function, deepcopy is recommended for W1 and W2, but the reason is not to avoid affecting the input parameters be modified – in that function, W1 & W2 are to be updated. The reason for deepcopy is actually to make sure the “deeper” dimensions of the matrices are also copied from the cache into the intermediate W1 & W2, so the values of the deeper dimensions are updated by applying the grads.
Please read this for more details.