W3_A1_Update_parameters_Test fails

Hello,

I am Yawen. I am working on the Week 3 assignment in ‘update_parameters’ cell. I failed the tests even if the output is the same as the expected output. Would someone know what might be the reason?
Thanks a lot in advance!

2 Likes

Update: In the time since this thread was first created, they have fixed the test cases for update_parameters in the Week 3 Planar Data assignment so that they no longer fail if you simply use -= without copying the parameters first. Note that this problem still exists in the Week 4 update parameters logic. It is also worth reading the thread just for the information about how things work in python with object references and procedure call semantics.

Did you use the “in place” operators for your update? That will fail the grader unless you are careful to copy the variables first to break the connection to the global definition of the input data. That is to say this fails:

W1 -= ... (update formula) ...

But this will pass the test cases (and the grader):

W1 = W1 - ... (update formula) ...

The reasons for this are pretty subtle. Those two statements produce the same output value, but how they manage memory is very different. In python, objects are passed “by reference” when making function calls. That means that the variable parameters inside the update_parameters function is actually an object reference to the global definition (the same one that was used to invoke the function). That means that if you write this:

W1 = parameters['W1']

Inside the function, that local variable W1 is actually an object reference to the global definition of the data in that dictionary. So if you then write this:

W1 -= <some formula>

You have actually modified the global data, because the definition of that operator “-=” is that it operates “in place”. That same global data is reused in several tests, so the subsequent tests fail because the inputs have been changed.

You can make “-=” work by first copying the elements of the dictionary internally to the function to break the connection between the local W1 and the global objects indexed by the dictionary:

W1 = parameters['W1']
W1 = W1.copy()

After that operation, the local variable W1 is now an object reference to a newly allocated memory object which is different from the original object referenced by parameters['W1'], but which contains the same values of course. There is a simpler solution using “deepcopy” that avoids having to individually copy every element of the dictionary: see this later post on this same thread for more information.

Just to complete the explanation here, the reason that this version works without the copy:

W1 = W1 - <update formula>

Is that the semantics of that assignment statement in python are that the RHS is allocated freshly as a new object reference, so then it gets assigned to W1 on the LHS which thus becomes a reference to the new object. The reference to the previous object referenced by W1 is dropped and may be garbage collected (although it won’t be in this case because there’s still a live global reference).

As I said earlier, it’s subtle. You may think python is a nice clean and easy language, but there can be some pretty nasty surprises from code that looks relatively straightforward.

32 Likes

Thank you very much and it makes sense ! I did use the ‘in place’ operator in the update_parameter function.

Many thanks!

1 Like

Glad I found this post, I had the same problem, thank you for the explanation!!!

4 Likes

Given that the Discourse search engine seems to work pretty well and people can find this thread, maybe it’s worth adding some more material about other “nasty surprises” you can get from the way object references work in python. Here’s another one that comes to mind:

Let’s suppose I have a numpy array called A:

np.random.seed(42)
A = np.random.randint(0, 10, (3,4))
print(f"A =\n{A}")
A =
[[6 3 7 4]
 [6 9 2 6]
 [7 4 3 7]]

Now suppose I make a “copy” of it and execute the following statements:

B = A
B[1,3] = 42
print(f"B =\n{B}")
B = 
[[ 6  3  7  4]
 [ 6  9  2 42]
 [ 7  4  3  7]]

But now watch this:

print(f"A =\n{A}")
A = 
[[ 6  3  7  4]
 [ 6  9  2 42]
 [ 7  4  3  7]]

Eeeek! What just happened? Because numpy arrays are python “objects”, when you make an assignment statement B = A, you are not actually creating a new copy of the memory object referenced by A. A python “object” variable is really what you would call a “pointer” in c or C++: it is just a “name” that points to the actual object in memory. So what B = A does is just create another variable that is an object reference to the same object in memory. That’s why modifying the contents of B also modifies the content of A. I’m sure I don’t need to emphasize that this can lead to very surprising side effects (surprising in a bad way). If your purpose is to make B an independent copy of the contents of A that you can modify without affecting A, the easiest way to do that is to use the “copy()” method as I showed in my earlier post on this thread:

B = A.copy()

Other equivalent ways to write that are:

B = np.array(A, copy = True)
B = np.copy(A)

It’s a matter of taste, but the first implementation seems simpler and cleaner.

Depending on how complex an object A is (e.g. if it also contains elements which are themselves object references), you might even need to do a “deep” copy:

B = copy.deepcopy(A)

Note that you need to import the copy package first in order to use deepcopy. The deepcopy is a bit more expensive, but is equivalent to “copy()” in the case the object in question is a “simple” object (with elements which don’t contain any object references). An example in which you need the “deep” version of copy is when the object in question is a python dictionary with elements that are objects (e.g. numpy arrays). Just doing “copy()” in that case doesn’t help: the deepcopy is required. Or you can do “copy” on each individual element, but that’s more code.

For more information on this, please see the numpy documentation (google “numpy copy”).

22 Likes

Thank you for the very detailed follow up explanation! Simply being aware of this will save me from some nasty bugs in the future!

I have played around a bit and I need to start thinking these B = A as pointers now. In fact, that statement is almost meaningless since it’s almost pointing at the same thing. I found that B is a fresh new variable if you change the assignment statement by something like B = A + 0.

Anyway, thanks again for the above and beyond effort in explaining this.

1 Like

Hello! I have the same error but in the nn_model () function. The parameters seem not to be learning since the cost function remains the same. I used the `W1 = W1 - … (update formula)

All the above tests are passed.

The learning rate is fixed as indicated in 1.2.

Thanks for the help! I don’t know what’s wrong

2 Likes

I’ve also encountered the same problem here. But, I found the bug. In my problem I’ve make a mistake in backward_propagation(), while calculating dZ1. See that portion you may have made a mistake there

2 Likes

Thanks @susant I have checked backward_propagation () but everything seems to be correct. The values I get match the expected values.

In dZ1 I am using np.dot () and then *. It seems correct to me, but it is possible that I am making a mistake that I am not grasping.

The problem I have is that forward_propagation () and backward_propagation () are not executed when nn_model () is executed. And I do not know why :frowning:

2 Likes

In that line, have you implemented elementwise multiplication using (*). In the same line, have you used the proper derivative function of activation function ‘g’.

1 Like

Yes. I used g '(z) as indicated. There must be some obvious detail that I am missing.

1 Like

Ehh, OKay. Once check all the functions you’re using.

1 Like

Exercise 5 block 2 after propagate
w = np.array([[1.], [2.]])
b = 2.
X =np.array([[1., 2., -1.], [3., 4., -3.2]])
Y = np.array([[1, 0, 1]])
X is (2,3) - 2 rows, 3 columns. 2 samples with each x having three params. So w should also have 3 params, I think. What am I missing here? I am struggling with shapes here.

1 Like

You’re looking at the wrong dimension of X: the size of w needs to agree with the number of rows of X, not columns. Here’s the linear activation:

Z = w^T \cdot X + b

So the number of rows of w needs to be 2 in your example, so that it becomes 1 x 2 after the transpose.

The other question is what your post has to do with the topic of this thread? This thread is about the update_parameters function in the Week 3 assignment, but you are asking a question about the Week 2 assignment.

1 Like

I got the below error:

W1 = [[-0.00615249 0.01692059]
[-0.02312532 0.03143428]
[-0.01691884 -0.01755122]
[ 0.00936272 -0.0502551 ]]
b1 = [[-8.98473586e-07]
[ 8.18998422e-06]
[ 6.06522029e-07]
[-2.55052521e-06]]
W2 = [[-0.01043174 -0.04022339 0.01608342 0.04442556]]
b2 = [[9.15934686e-05]]

AssertionError Traceback (most recent call last)
in
7 print("b2 = " + str(parameters[“b2”]))
8
----> 9 update_parameters_test(update_parameters)

~/work/release/W3A1/public_tests.py in update_parameters_test(target)
237 assert output[“b2”].shape == expected_output[“b2”].shape, f"Wrong shape for b2."
238
→ 239 assert np.allclose(output[“W1”], expected_output[“W1”]), “Wrong values for W1”
240 assert np.allclose(output[“b1”], expected_output[“b1”]), “Wrong values for b1”
241 assert np.allclose(output[“W2”], expected_output[“W2”]), “Wrong values for W2”

AssertionError: Wrong values for W1

Expected output

W1 = [[-0.00643025 0.01936718]
[-0.02410458 0.03978052]
[-0.01653973 -0.02096177]
[ 0.01046864 -0.05990141]]
b1 = [[-1.02420756e-06]
[ 1.27373948e-05]
[ 8.32996807e-07]
[-3.20136836e-06]]
W2 = [[-0.01041081 -0.04463285 0.01758031 0.04747113]]
b2 = [[0.00010457]]

I am stuck here please help!

1 Like

There aren’t that many moving parts in this function. Are you sure you didn’t hard-code the learning rate or something like that? Notice that your values are consistently smaller in absolute value than the expected values.

2 Likes

I did hard code the learning rate hence the error, thank you Paul

2 Likes