Clarification on the reason of failed excesrise

SRezaS · September 2, 2023, 11:07am

C1W4A1: Building your Deep Neural Network: Step by Step

In the last exercise (no. 10)

parameters["W" + str(l+1)] = parameters["W" + str(l+1)] - learning_rate * grads["dW" + str(l + 1)]
parameters["b" + str(l+1)] = parameters["b" + str(l+1)] - learning_rate * grads["db" + str(l + 1)]

passes both tests while

parameters["W" + str(l+1)] -= learning_rate * grads["dW" + str(l + 1)]
parameters["b" + str(l+1)] -= learning_rate * grads["db" + str(l + 1)]

passes the first and fails the second test. aren’t they the same? why does this happen?

SRezaS · September 2, 2023, 11:28am

In [1]: a = 1

In [2]: b = 2

In [3]: a -= b

In [4]: a
Out[4]: -1

If this was the case, then a would become 1 instead of -1

Deepti_Prasad · September 2, 2023, 11:33am

I was waiting for a reply like this

Here the basic step in neural network is updating parameters - while improves the learning rate with the gradient descent.

My explanation of a = a - b was not coz of higher or lower value but to explain why your other image differs from each other

We are updating parameter while improving learning rate with grad function computes the sum of gradients of the outputs. So you first choose the parameter and then apply learning rate which a hyperparameter that controls the weights of our neural network with respect to the loss gradient.

So that is why it should a = a - b and need to be defined as a = a - b and not because 2- 1 or 1-2

SRezaS · September 2, 2023, 11:43am

Sorry, I don’t understand.

Deepti_Prasad · September 2, 2023, 12:00pm

if you follow your image of a = b - a, you will not get the expected output in simpler terms, your update parameters can have opposite effect/result than the expected output.

see this image just above the grader cell you shared where it mention while we are updating the parameter, we are subtracting or derivative of the parameter. So you cannot update your parameters first with partial derivative of parameters but first with the updated parameter - your partial derivative of parameter

TMosh · September 2, 2023, 4:35pm

I believe a -= b is equivalent to a = a - b.

It’s the same reason that i += 1 is equivalent to i = i + 1.

paulinpaloalto · September 2, 2023, 5:42pm

Please read this thread to understand why that is. You need “copy” or “deepcopy” to break the connection to the global variables.

paulinpaloalto · September 2, 2023, 5:45pm

Yes, that’s correct. That is the meaning of “-=” from an arithmetic p.o.v. But what’s really going on here is more complicated than that. The numeric result is the same, but if the operands in question are “objects” (meaning pointers) then the way memory is managed is very different. The “-=” operator is “in place”, meaning that it directly modifies the object in memory. The plain assignment allocates a new memory object for the RHS, so the original variable is not modified in memory. If the variable in question is an object passed as a parameter to a python function and you didn’t first “copy” it, then you’ve modified global data with the “-=” approach. The test cases for this exercise are written in such a way that modifying the global data causes subsequent tests to fail.

The thread I linked explains this in more detail.

Deepti_Prasad · September 2, 2023, 5:46pm

When I explained this it was more related to updating parameters and not the integers

SRezaS · September 3, 2023, 11:23am

The copy() had actually happened before the “for loop” in which the updating happens and caused the confusion.

Here is the full code:

def update_parameters(params, grads, learning_rate):
    """
    Update parameters using gradient descent
    
    Arguments:
    params -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients, output of L_model_backward
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
                  parameters["W" + str(l)] = ... 
                  parameters["b" + str(l)] = ...
    """
    parameters = params.copy()
    L = len(parameters) // 2 # number of layers in the neural network

{moderator edit - solution code removed}

SRezaS · September 3, 2023, 11:31am

I was able to fix it by deep copying the params using parameters = copy.deepcopy(params)

Here is the full code:

# GRADED FUNCTION: update_parameters
import copy

def update_parameters(params, grads, learning_rate):
    """
    Update parameters using gradient descent
    
    Arguments:
    params -- python dictionary containing your parameters 
    grads -- python dictionary containing your gradients, output of L_model_backward
    
    Returns:
    parameters -- python dictionary containing your updated parameters 
                  parameters["W" + str(l)] = ... 
                  parameters["b" + str(l)] = ...
    """
    parameters = copy.deepcopy(params)
    L = len(parameters) // 2 # number of layers in the neural network

 *{moderator edit - solution code removed}*

Deepti_Prasad · September 3, 2023, 12:13pm

did you pass the grader? any changes beyond or outside ##YOUR CODE STARTS HERE -----###YOUR CODES END HERE can cause grader assessment issue.

paulinpaloalto · September 3, 2023, 3:02pm

That does not work, because parameters is a compound object. That creates a new copy of the whole dictionary, but the individual arrays in the dictionary are not duplicated. You need the deepcopy as I see you discovered in your later post on this thread.

SRezaS · September 3, 2023, 3:11pm

Yes, so will this change make it to the assignment notebook?

Also thanks for the clarification, I didn’t know the reason why deepcopy() works.

paulinpaloalto · September 3, 2023, 3:27pm

That’s a good point: the template code is misleading. I will file a bug about this and hope that the course staff will act on it. Should be a simple fix other than that they also have to add the “import” for the copy package.

Topic		Replies	Views
Use "-=" gives wrong result and does not pass the test Neural Networks and Deep Learning	6	746	August 19, 2021
Week 4 Building Neural Network Neural Networks and Deep Learning	4	622	July 20, 2021
W4_A1_subtlety in updating parameters Neural Networks and Deep Learning	1	590	January 8, 2023
Week 4 - Assignment1 - Exercise 10 / Update parameters Neural Networks and Deep Learning	2	644	June 15, 2021
Week 4 Excercise 10 Programming Assignment # 1 Neural Networks and Deep Learning	4	1151	October 7, 2022

Clarification on the reason of failed excesrise

Related topics