Art generation neural style assignment: train_step

Hi,

LabId: dhgeqtxu

In this Week4, Assignment 2, I developed the compute_layer_style_cost, compute_content_cost and total_cost functions and they all ‘passed the test’.

When I get to the train_step function, I call the above functions as required by the routines, within the GradientTape, and at the end I get this error:

tf.Tensor(-1595361400.0, shape=(), dtype=float32)

AssertionError Traceback (most recent call last)
in
6 print(J1)
7 assert type(J1) == EagerTensor, f"Wrong type {type(J1)} != {EagerTensor}"
----> 8 assert np.isclose(J1, 25629.055, rtol=0.05), f"Unexpected cost for epoch 0: {J1} != {25629.055}"
9
10 J2 = train_step(generated_image)

AssertionError: Unexpected cost for epoch 0: -1595361408.0 != 25629.055

So I go ahead and create a cell to run code individually and the result is that the compute_content_cost is giving positive values (>=0) while the compute_style_cost is giving negative values.

I then added PRINT to the compute_layer_style_cost and it is returning negative values in some calls of the loop.

I decide to compare the shape of a_S when called in the TEST (while developing) and when called by the “train_step” routine. They turn out to be very different (see “SHAPES OF a_S” below).

The a_S is given by the lab.

I’ll keep trying to understand more what’s going on, but if you have a clue with this information so far, I’d appreciate it.

SHAPES OF a_S:
In the test routine of compute_layer_style_cost , a_S is of this shape:
tf.Tensor(
[[[[-3.404881 7.183007 2.534576 ]
[-2.5186315 -3.8986888 -2.9244845 ]
[ 1.3512313 0.18695849 -1.2326248 ]
[-1.8821764 -1.5039697 -1.8601038 ]]

[[-0.39341784 -0.34587932 1.730303 ]
[ 5.434381 6.118635 0.91409665]
[-0.2787553 2.493302 2.0111642 ]
[ 3.5750656 9.585232 -2.3005989 ]]

[[-2.616547 6.5795145 5.8995004 ]
[ 1.2345984 -0.9685255 -2.2799122 ]
[ 0.25895953 -0.5711074 -1.6340902 ]
[-2.933355 2.555351 -3.1489797 ]]

[[-5.2402277 0.36834985 -0.42677724]
[ 0.19823879 7.452428 3.718691 ]
[ 1.3253293 6.3523054 5.739221 ]
[-0.41526246 0.583993 -2.0045857 ]]]], shape=(1, 4, 4, 3), dtype=float32)
J_style_layer = tf.Tensor(14.017805, shape=(), dtype=float32)

However, when the train_step is executed, the a_S is of this shape:
[<tf.Tensor: shape=(1, 400, 400, 64), dtype=float32, numpy=
array([[[[0. , 0.19946006, 0.14004472, …, 0.39006442,
0.6287072 , 0.28686455],
[0. , 0.30635077, 0.4513801 , …, 0. ,
1.6358649 , 1.25895 ],
[0. , 0.3085397 , 0.45269358, …, 0. ,
1.6476395 , 1.2675667 ],
…,
[1.4744289 , 0.33219847, 0.3532747 , …, 0.5832814 ,
0.28877902, 0.6445552 ],
[1.4231954 , 0.32877296, 0.3362352 , …, 0.57939744,
0.24375242, 0.589922 ],
[1.8047211 , 0.32466665, 0.4276283 , …, 0.7903815 ,
0.7935723 , 1.1135107 ]]]], dtype=float32)>, <tf.Tensor: shape=(1, 200, 200, 128), dtype=float32, numpy=
array([[[[ 0. , 0. , 0. , …, 7.244313 ,
0. , 12.539404 ],
[ 0. , 0. , 1.5505853 , …, 3.2639866 ,
0. , 0. ],
[ 0. , 0. , 1.1315385 , …, 4.9681845 ,
0. , 1.4705106 ],
…,
…,
[ 0. , 2.1687598 , 0. , …, 5.5728908 ,
2.0135236 , 0.9283667 ],
[ 0. , 2.6301243 , 0.08520482, …, 5.8851094 ,
1.4282581 , 1.3019186 ],
[ 0. , 2.4105735 , 4.4843197 , …, 9.010862 ,
2.8939793 , 0. ]]]], dtype=float32)>, <tf.Tensor: shape=(1, 100, 100, 256), dtype=float32, numpy=
array([[[[0.0000000e+00, 1.5934262e+00, 0.0000000e+00, …,
0.0000000e+00, 3.4129314e+00, 1.2514573e+01],
[0.0000000e+00, 1.6323670e+00, 0.0000000e+00, …,
0.0000000e+00, 0.0000000e+00, 9.9119473e+00],
[0.0000000e+00, 2.3139105e+00, 0.0000000e+00, …,
0.0000000e+00, 5.5055523e-01, 9.5699739e+00],
…,
[0.1449331 , 0. , 0. , …, 0. ,
0.9925568 , 0. ],
[0.24609339, 0. , 0. , …, 0. ,
0.9211575 , 0. ],
[0.30705586, 0. , 0. , …, 0. ,
0.95021063, 0. ]]]], dtype=float32)>]

I’ve been able to solve this by changing some functions from compute_layer_style_cost, namely:

instead of using tf.square(x)…
… I am now using x**2

This is now producing results in the required shapes.

The question remains: why is this happening? may be some day this will be clear to me :slight_smile:

For now, CASE CLOSED.

Thanks!

Juan

It is most likely caused by type coercion errors when mixing integer and floating point values. TF has different rules than numpy. Here’s a thread which demonstrates some of these issues.