Be careful with division in C4W4 compute_layer_style_cost

In the exercise 6 I got some errors, that I found hard to fix (incredible large negative value for J_style). The problem was in exercise 6 in the compute_layer_style_cost. Because of numerical issues, we often got some unexpected values (1 / (4 * n_C^2 * n_H^2 * n_W^2) become negative). I suggest you to avoid tf.math.divide in your computations. Try this part of code for yourself.

use 1 / (4 * n_C^2 * n_H^2 * n_W^2) and tf.multiply simultaneously instead.

Hope this helps.

One correction, those values (e-10 and e-09) are actually incredibly small negative values - because the exponents are negative.

But still it’s a very strange sort of problem. Thanks for your post.

1 Like

Yes, this is really interesting! Thanks very much for doing the investigation and pointing this out. I had seen a bunch of students report negative values for the costs on this assignment and wondered how that could happen, given that the costs here are all “sum of squares”.

So at a simplistic level, the message seems to be: don’t use tensors for things that don’t really need to be tensors. But that can’t be the full explanation: after all everything here is just IEEE 754 floating point computation, right? And that has very strict rules and very predictable accuracy. It seems very strange that a mere rounding error could flip the result to be negative in the first place and the bigger picture point is that I’d expect the rounding behavior to be the same. Let me try some experiments doing the same computations with plain numpy versus TF primitives and see if I can find a better explanation.

All righty then! We have an answer. The problem is not that you used tf.divide. The problem is that you specified the constants 1 and 4 as integers, so both the arguments to tf.divide end up having integer types and that’s what causes things to go sideways. If you write your expression like this, then it all “just works”:

foo4 = tf.cast(tf.math.divide(1., 4. * n_C**2 * (n_H*n_W)**2), dtype = 'float32')

This version fails:

foo4 = tf.cast(tf.math.divide(1, 4 * n_C**2 * (n_H*n_W)**2), dtype = 'float32')

See the difference? It’s subtle, but 0 is not the same thing in python as 0. or 0.0.

Just to document the process one step further, here is my experimental apparatus in the form of a code block added to compute_layer_style_cost:

print(f"n_C {n_C} n_H {n_H} n_W {n_W}")
foo1 = np.square(1./(2.*n_H*n_W*n_C))
print(f"foo1 = {foo1}")
foo2 = 1./np.square(2.*n_H*n_W*n_C)
print(f"foo2 = {foo2}")
foo3 = tf.math.divide(1,np.square(2*n_H*n_W*n_C))
print(f"foo3 = {foo3}")
foo4 = tf.cast(tf.math.divide(1, 4 * n_C**2 * (n_H*n_W)**2), dtype = 'float32')
print(f"foo4 = {foo4}")
foo5 = tf.cast(tf.math.divide(1., 4. * n_C**2 * (n_H*n_W)**2), dtype = 'float32')
print(f"foo5 = {foo5}")

Then I ran the main cell that computes the cost with the real images and here is what that code generates:

n_C 64 n_H 400 n_W 400
foo1 = 2.384185791015625e-15
foo2 = 2.384185791015625e-15
foo3 = 9.313225746154785e-10
foo4 = 9.313225746154785e-10
foo5 = 2.384185776525805e-15
n_C 128 n_H 200 n_W 200
foo1 = 9.5367431640625e-15
foo2 = 9.5367431640625e-15
foo3 = 3.725290298461914e-09
foo4 = 3.725290298461914e-09
foo5 = 9.53674310610322e-15
n_C 256 n_H 100 n_W 100
foo1 = 3.814697265625e-14
foo2 = 3.814697265625e-14
foo3 = -4.80682619156376e-10
foo4 = -4.806826048309176e-10
foo5 = 3.814697242441288e-14
n_C 512 n_H 50 n_W 50
foo1 = 1.52587890625e-13
foo2 = 1.52587890625e-13
foo3 = -1.922730476625504e-09
foo4 = -1.9227304193236705e-09
foo5 = 1.5258788969765152e-13
n_C 512 n_H 25 n_W 25
foo1 = 2.44140625e-12
foo2 = 2.44140625e-12
foo3 = 6.336706421303987e-10
foo4 = 6.336706159792982e-10
foo5 = 2.4414062351624244e-12
tf.Tensor(598.8282, shape=(), dtype=float32)

You can see that the integer versions (foo3 and foo4) are significantly different in all cases, but the incorrect result is not always negative.

One other note is that you can get away with using integer values for 1 and 2 in the python expressions for foo1 and foo2. That is because one of the major changes between python 2.x and python 3.x is that they changed the way type coercion works to make 1/m be floating point “out of the gate” even if the numerator and denominator are integer types. One of the famous landmines you could step on in python 2.x is demonstrated by the following code:

m = 5
x = 1 / m
y = 1. / m

In python 2.x, x and y do not have the same value, but they do in python 3.x. In python 2.x, the type of x is integer, so the value ends up as 0 (that’s 0 as an integer not 0. as a floating point value). In python 3.x, they both end up as 0.2 as one would naively expect. :nerd_face: