Here the basic step in neural network is updating parameters - while improves the learning rate with the gradient descent.
My explanation of a = a - b was not coz of higher or lower value but to explain why your other image differs from each other
We are updating parameter while improving learning rate with grad function computes the sum of gradients of the outputs. So you first choose the parameter and then apply learning rate which a hyperparameter that controls the weights of our neural network with respect to the loss gradient.
So that is why it should a = a - b and need to be defined as a = a - b and not because 2- 1 or 1-2
if you follow your image of a = b - a, you will not get the expected output in simpler terms, your update parameters can have opposite effect/result than the expected output.
see this image just above the grader cell you shared where it mention while we are updating the parameter, we are subtracting or derivative of the parameter. So you cannot update your parameters first with partial derivative of parameters but first with the updated parameter - your partial derivative of parameter
Yes, that’s correct. That is the meaning of “-=” from an arithmetic p.o.v. But what’s really going on here is more complicated than that. The numeric result is the same, but if the operands in question are “objects” (meaning pointers) then the way memory is managed is very different. The “-=” operator is “in place”, meaning that it directly modifies the object in memory. The plain assignment allocates a new memory object for the RHS, so the original variable is not modified in memory. If the variable in question is an object passed as a parameter to a python function and you didn’t first “copy” it, then you’ve modified global data with the “-=” approach. The test cases for this exercise are written in such a way that modifying the global data causes subsequent tests to fail.
That does not work, because parameters is a compound object. That creates a new copy of the whole dictionary, but the individual arrays in the dictionary are not duplicated. You need the deepcopy as I see you discovered in your later post on this thread.
That’s a good point: the template code is misleading. I will file a bug about this and hope that the course staff will act on it. Should be a simple fix other than that they also have to add the “import” for the copy package.