Clarification on the reason of failed excesrise

C1W4A1: Building your Deep Neural Network: Step by Step

In the last exercise (no. 10)

parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l + 1)]
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l + 1)]

passes both tests while

parameters["W" + str(l+1)] -= learning_rate * grads["dW" + str(l + 1)]
parameters["b" + str(l+1)] -= learning_rate * grads["db" + str(l + 1)]

passes the first and fails the second test. aren’t they the same? why does this happen?

In [1]: a = 1

In [2]: b = 2

In [3]: a -= b

In [4]: a
Out[4]: -1

If this was the case, then a would become 1 instead of -1

I was waiting for a reply like this :slight_smile:

Here the basic step in neural network is updating parameters - while improves the learning rate with the gradient descent.

My explanation of a = a - b was not coz of higher or lower value but to explain why your other image differs from each other

We are updating parameter while improving learning rate with grad function computes the sum of gradients of the outputs. So you first choose the parameter and then apply learning rate which a hyperparameter that controls the weights of our neural network with respect to the loss gradient.

So that is why it should a = a - b and need to be defined as a = a - b and not because 2- 1 or 1-2 :slight_smile:

Sorry, I don’t understand.

if you follow your image of a = b - a, you will not get the expected output in simpler terms, your update parameters can have opposite effect/result than the expected output.

see this image just above the grader cell you shared where it mention while we are updating the parameter, we are subtracting or derivative of the parameter. So you cannot update your parameters first with partial derivative of parameters but first with the updated parameter - your partial derivative of parameter

I believe a -= b is equivalent to a = a - b.

It’s the same reason that i += 1 is equivalent to i = i + 1.

Please read this thread to understand why that is. You need “copy” or “deepcopy” to break the connection to the global variables.

Yes, that’s correct. That is the meaning of “-=” from an arithmetic p.o.v. But what’s really going on here is more complicated than that. The numeric result is the same, but if the operands in question are “objects” (meaning pointers) then the way memory is managed is very different. The “-=” operator is “in place”, meaning that it directly modifies the object in memory. The plain assignment allocates a new memory object for the RHS, so the original variable is not modified in memory. If the variable in question is an object passed as a parameter to a python function and you didn’t first “copy” it, then you’ve modified global data with the “-=” approach. The test cases for this exercise are written in such a way that modifying the global data causes subsequent tests to fail.

The thread I linked explains this in more detail.

1 Like

When I explained this it was more related to updating parameters and not the integers :woman_facepalming:

The copy() had actually happened before the “for loop” in which the updating happens and caused the confusion.

Here is the full code:

def update_parameters(params, grads, learning_rate):
    """
    Update parameters using gradient descent
    
    Arguments:
    params -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients, output of L_model_backward
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
                  parameters["W" + str(l)] = ... 
                  parameters["b" + str(l)] = ...
    """
    parameters = params.copy()
    L = len(parameters) // 2 # number of layers in the neural network

{moderator edit - solution code removed}

I was able to fix it by deep copying the params using parameters = copy.deepcopy(params)

Here is the full code:

# GRADED FUNCTION: update_parameters
import copy

def update_parameters(params, grads, learning_rate):
    """
    Update parameters using gradient descent
    
    Arguments:
    params -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients, output of L_model_backward
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
                  parameters["W" + str(l)] = ... 
                  parameters["b" + str(l)] = ...
    """
    parameters = copy.deepcopy(params)
    L = len(parameters) // 2 # number of layers in the neural network

 *{moderator edit - solution code removed}*

did you pass the grader? any changes beyond or outside ##YOUR CODE STARTS HERE -----###YOUR CODES END HERE can cause grader assessment issue.

That does not work, because parameters is a compound object. That creates a new copy of the whole dictionary, but the individual arrays in the dictionary are not duplicated. You need the deepcopy as I see you discovered in your later post on this thread.

Yes, so will this change make it to the assignment notebook?

Also thanks for the clarification, I didn’t know the reason why deepcopy() works.

That’s a good point: the template code is misleading. I will file a bug about this and hope that the course staff will act on it. Should be a simple fix other than that they also have to add the “import” for the copy package.