[DLS1] Week 3 - exercise 6: error by 1e-8

Hello everyone,

I’m posting here for the first time, I hope I’m in the right place.
While completing the assignement from week 3 of DLS course 1, I encounter a problem in exercise 6.
After carefully using the following formulas
I get a dreadful AssertionError: Wrong values for dW1

Upon checking what I should get, here’s the difference between what I get and what I should get:

These look like approximation errors to me. I first thought that it was due to me using np.mean instead of (1/m) * np.sum and Z1 ** 2 instead of np.power(Z1, 2), but even after replacing those, thus following exactly the instructions, i can’t get the error to go away.

Of course, this error accumulates through in exercise 8, and also makes it wrong.

Does anynoe have any idea please?


Hey @Goudout,
Welcome to the community. Can you please download your notebook and DM it to me. For DM, please click on my name and select “Message”.


Okay, I found my problem. I was using the wrong formule for dZ1:
I used np.dot(W2.T, dZ2) * (1 - np.power(Z1, 2)) instead of np.dot(W2.T, dZ2) * (1 - np.power(A1, 2))

Sorry and thanks for your time!

It’s great to hear that you found the solution. Just for future reference, note that 1e-8 is not a rounding error: it’s a real error. Rounding errors can happen, of course, but they are usually in the range of 1e-16, although in some pathological cases they can accumulate instead of statistically balancing each other out.

We are working in IEEE 754 floating point here. By default most things end up being 64 bit float values in numpy. That means there are literally a finite number of values that we can represent, as opposed to the abstract beauty of \mathbb{R}. That means we can’t exactly represent the true mathematical value of even an expression as simple as \frac {1}{7}.

Here’s a chunk of code to do a little experiment with one case in which there are two obvious ways to write an algorithm in python that give different numeric results:

A = np.random.rand(3,5)
print(f"A.dtype = {A.dtype}")
print(f"A = {A}")
m = 7.
B = 1./m * A
C = A/m
D = (B == C)
print(f"D = {D}")

Here’s what I get when I run that:

A.dtype = float64
A = [[0.37454012 0.95071431 0.73199394 0.59865848 0.15601864]
 [0.15599452 0.05808361 0.86617615 0.60111501 0.70807258]
 [0.02058449 0.96990985 0.83244264 0.21233911 0.18182497]]
D = [[False  True False False  True]
 [ True False False False  True]
 [False  True  True False False]]

So you can see that when we test for exact equality between B and C, quite a few of the elements end up being different. But it turns out that the differences are pretty small. Here’s a chunk of code to explore that:

diff = (B - C)
diffNorm = np.linalg.norm(diff)
print(f"diffNorm = {diffNorm}")
print(f"diff = {diff}")

When I run that, here’s what we see:

diffNorm = 2.9082498092558215e-17
diff = [[-6.93889390e-18  0.00000000e+00 -1.38777878e-17 -1.38777878e-17
 [ 0.00000000e+00 -1.73472348e-18 -1.38777878e-17 -1.38777878e-17
 [-4.33680869e-19  0.00000000e+00  0.00000000e+00 -3.46944695e-18

So you can see that all the non-zero differences are in the range 10^{-19} to 10^{-17}. Now this is in 64 bit floating point. The resolution of 32 bit floating point is much lower, of course, so the errors would be larger if we ran this same experiment in the 32 bit case.

There is a whole branch of mathematics called Numerical Analysis which studies (among other things) how to manage this type of approximation error. It turns out that some algorithms are what they call “stable”, meaning that the errors caused by finite representations stay relatively small and don’t “compound”. But there can be algorithms which do not have that well behaved property and are “unstable”: if you run them for many iterations the errors can accumulate instead of balancing out and actually cause problems. In most of the cases we will run into in ML/DL, the underlying packages like TF have been written such that the algorithms are numerically stable and we don’t have to worry about it.

Given the above example, you may well wonder how the grader can check our answers if such a simple difference in the code can result in different values. It turns out that numpy provides functions for comparison that allow you to specify a “close enough” threshold. Have a look at the documentation for numpy isclose and numpy allclose.