C4 wk4 assn2 ex6 - train_step second step

Hi

I’ve read and believe I’ve corrected previously noted issues so that my styler has

{mentor edit: code removed}

however when running the Ex 6 train_step it produces the right answer for the 1st step but not the second (see images). Any help greatly appreciated!

-Neal

{mentor edit: code removed}

{mentor edit: code removed}

I’m in exactly the same situation. Combed over each previous function and triple checked that transposes were in the right place, and the math was right. I’m sure it’s something simple, but I’m having a heck of a time finding it.

yes mainly the fixes are around the floating point issue with 1/4*(nc^2*(nh*hw)^2), and around transpose vs reshape in the style layer and content cost functions.

it seems that the style layer cost is more often the issue. i think with reshape, if we leave n_C as the final dimension, then it will not garble the image when reshaping to (n_Hn_W,n_C). then if we tranpose we get n_C, n_Hn_W. or i tried tranpose (0,3,1,2) to move the n_C to the front, then reshape(n_C, -1) since that didn’t seem to be garbling. but still no.

have you seen any other typical issues?

The usual suspects, kernel restart, transpose missing, math checks, I’ve tried multiple times.

For the float issue, using **2, rather than tf.square seems to take care of that making it a float32:

{mentor edit: code removed}

The error is consistent across the various tweaks I’ve made to UNQ_C1 and _C3 with different reformattings/function swaps. It does work to train the model. Of course, the resulting image looks more like an impressionist sneeze than the Louvre.

  1. Please do not post your code on the forum, unless a Mentor asks to see it. Posting your code is not allowed by the course Honor Code.

  2. Your shape arguments for “a_S = …” and “a_G = …” are incorrect.

Hi

Apologies about posting code - i didn’t know that was disallowed, and thanks very much for removing it. I think that the intermediate calculations are not solutions, so I will post my intermediate calcs and maybe that can shed some light on this. If not I apologize again would ask you to again delete.

I’ve gone through all the shape parameters in both content and style_layer and it doesn’t seem to correct anything. I’ve also done the style_layer by transposing then reshaping, and by reshaping then transposing, and found i get the same answer. I’m sorry but I don’t understand what is wrong with the shape parameters. So when I run the tests for compute_content_cost, then if i take the a_C and perform the transpose with the ordering vector as noted, i get:

tf.Tensor(
[[[[-3.404881 -2.5186315 1.3512313 -1.8821764 ]
[-0.39341784 5.434381 -0.2787553 3.5750656 ]
[-2.616547 1.2345984 0.25895953 -2.933355 ]
[-5.2402277 0.19823879 1.3253293 -0.41526246]]

[[ 7.183007 -3.8986888 0.18695849 -1.5039697 ]
[-0.34587932 6.118635 2.493302 9.585232 ]
[ 6.5795145 -0.9685255 -0.5711074 2.555351 ]
[ 0.36834985 7.452428 6.3523054 0.583993 ]]

[[ 2.534576 -2.9244845 -1.2326248 -1.8601038 ]
[ 1.730303 0.91409665 2.0111642 -2.3005989 ]
[ 5.8995004 -2.2799122 -1.6340902 -3.1489797 ]
[-0.42677724 3.718691 5.739221 -2.0045857 ]]]], shape=(1, 3, 4, 4), dtype=float32)

and if i next reshape I get a_C_unrolled

tf.Tensor(
[[[-3.404881 -2.5186315 1.3512313 -1.8821764 -0.39341784
5.434381 -0.2787553 3.5750656 -2.616547 1.2345984
0.25895953 -2.933355 -5.2402277 0.19823879 1.3253293
-0.41526246]
[ 7.183007 -3.8986888 0.18695849 -1.5039697 -0.34587932
6.118635 2.493302 9.585232 6.5795145 -0.9685255
-0.5711074 2.555351 0.36834985 7.452428 6.3523054
0.583993 ]
[ 2.534576 -2.9244845 -1.2326248 -1.8601038 1.730303
0.91409665 2.0111642 -2.3005989 5.8995004 -2.2799122
-1.6340902 -3.1489797 -0.42677724 3.718691 5.739221
-2.0045857 ]]], shape=(1, 3, 16), dtype=float32)

Next, if I go in the get_style_layer_cost a_S starts off like this:

tf.Tensor(
[[[[ 2.6123514 -3.3520832 0.74761856]
[ 6.3462267 3.8470404 -0.9571458 ]
[-2.0568852 -3.1489944 -4.0077353 ]
[ 1.0848972 -1.2055032 -5.972679 ]]

[[-0.34144378 -3.17067 5.036553 ]
[ 5.9450154 -1.7347562 3.6944358 ]
[-0.68249106 -3.1652112 -1.7189786 ]
[ 6.638078 -0.90944517 9.18924 ]]

[[ 4.405425 0.31713337 2.566379 ]
[ 3.2136106 0.23800504 1.4399388 ]
[-0.88850987 -0.10706711 -1.7099016 ]
[ 8.216282 0.6901974 3.6196625 ]]

[[ 1.1940846 1.7071393 -0.9568796 ]
[-3.8442307 6.2297974 2.8206615 ]
[ 4.486284 -2.2124012 -4.4811783 ]
[-3.2315984 -5.5964684 3.4338741 ]]]], shape=(1, 4, 4, 3), dtype=float32)

to see an image, we remember the columns are channels, the rows here correspond to n_W, the tables correspond to n_H, and there is an extra enclosing bracket. After reshape a_S i get:

tf.Tensor(
[[ 2.6123514 -3.3520832 0.74761856]
[ 6.3462267 3.8470404 -0.9571458 ]
[-2.0568852 -3.1489944 -4.0077353 ]
[ 1.0848972 -1.2055032 -5.972679 ]
[-0.34144378 -3.17067 5.036553 ]
[ 5.9450154 -1.7347562 3.6944358 ]
[-0.68249106 -3.1652112 -1.7189786 ]
[ 6.638078 -0.90944517 9.18924 ]
[ 4.405425 0.31713337 2.566379 ]
[ 3.2136106 0.23800504 1.4399388 ]
[-0.88850987 -0.10706711 -1.7099016 ]
[ 8.216282 0.6901974 3.6196625 ]
[ 1.1940846 1.7071393 -0.9568796 ]
[-3.8442307 6.2297974 2.8206615 ]
[ 4.486284 -2.2124012 -4.4811783 ]
[-3.2315984 -5.5964684 3.4338741 ]], shape=(16, 3), dtype=float32)

so we see the columns are still n_C and not garbling across the image. So next i transpose to get:

tf.Tensor(
[[ 2.6123514 6.3462267 -2.0568852 1.0848972 -0.34144378 5.9450154
-0.68249106 6.638078 4.405425 3.2136106 -0.88850987 8.216282
1.1940846 -3.8442307 4.486284 -3.2315984 ]
[-3.3520832 3.8470404 -3.1489944 -1.2055032 -3.17067 -1.7347562
-3.1652112 -0.90944517 0.31713337 0.23800504 -0.10706711 0.6901974
1.7071393 6.2297974 -2.2124012 -5.5964684 ]
[ 0.74761856 -0.9571458 -4.0077353 -5.972679 5.036553 3.6944358
-1.7189786 9.18924 2.566379 1.4399388 -1.7099016 3.6196625
-0.9568796 2.8206615 -4.4811783 3.4338741 ]], shape=(3, 16), dtype=float32)

I get this same tensor by reversing the operations (transpose followed by reshape with adjusted parameters). Could you give some more insight to help me understand what is wrong with the shape parameters.

Thanks for your help

TMosh, I appreciate the help. In digging into the a_S and a_G parameters, I’m not sure where the error lies. Instrumenting with a few print statements through the final pass, a_S and a_G are in sync, the shapes are changing through the CNN, and it gets the correct cost for the first pass, only failing on the second. If the shapes were wrong, wouldn’t it also fail at the first pass through?

**VGG a_G**
**[<tf.Tensor 'functional_1/block1_conv1/Relu:0' shape=(1, 400, 400, 64) dtype=float32>, <tf.Tensor 'functional_1/block2_conv1/Relu:0' shape=(1, 200, 200, 128) dtype=float32>, <tf.Tensor 'functional_1/block3_conv1/Relu:0' shape=(1, 100, 100, 256) dtype=float32>, <tf.Tensor 'functional_1/block4_conv1/Relu:0' shape=(1, 50, 50, 512) dtype=float32>, <tf.Tensor 'functional_1/block5_conv1/Relu:0' shape=(1, 25, 25, 512) dtype=float32>, <tf.Tensor 'functional_1/block5_conv4/Relu:0' shape=(1, 25, 25, 512) dtype=float32>]**
**a_G Original**
**Tensor("functional_1/block1_conv1/Relu:0", shape=(1, 400, 400, 64), dtype=float32)**
**a_S Style**
**(64, 160000)**
**a_G Style**
**(64, 160000)**
**Tensor("transpose_1:0", shape=(64, 160000), dtype=float32)**
**a_G Original**
**Tensor("functional_1/block2_conv1/Relu:0", shape=(1, 200, 200, 128), dtype=float32)**
**a_S Style**
**(128, 40000)**
**a_G Style**
**(128, 40000)**
**Tensor("transpose_5:0", shape=(128, 40000), dtype=float32)**
**a_G Original**
**Tensor("functional_1/block3_conv1/Relu:0", shape=(1, 100, 100, 256), dtype=float32)**
**a_S Style**
**(256, 10000)**
**a_G Style**
**(256, 10000)**
**Tensor("transpose_9:0", shape=(256, 10000), dtype=float32)**
**a_G Original**
**Tensor("functional_1/block4_conv1/Relu:0", shape=(1, 50, 50, 512), dtype=float32)**
**a_S Style**
**(512, 2500)**
**a_G Style**
**(512, 2500)**
**Tensor("transpose_13:0", shape=(512, 2500), dtype=float32)**
**a_G Original**
**Tensor("functional_1/block5_conv1/Relu:0", shape=(1, 25, 25, 512), dtype=float32)**
**a_S Style**
**(512, 625)**
**a_G Style**
**(512, 625)**
**Tensor("transpose_17:0", shape=(512, 625), dtype=float32)**
**J_style**
**Tensor("add_4:0", shape=(), dtype=float32)**
**a_C**
**(1, 625, 512)**
**a_G**
**(1, 625, 512)**
**J_content**
**Tensor("mul_10:0", shape=(), dtype=float32)**
**VGG a_G**
**[<tf.Tensor 'functional_1/block1_conv1/Relu:0' shape=(1, 400, 400, 64) dtype=float32>, <tf.Tensor 'functional_1/block2_conv1/Relu:0' shape=(1, 200, 200, 128) dtype=float32>, <tf.Tensor 'functional_1/block3_conv1/Relu:0' shape=(1, 100, 100, 256) dtype=float32>, <tf.Tensor 'functional_1/block4_conv1/Relu:0' shape=(1, 50, 50, 512) dtype=float32>, <tf.Tensor 'functional_1/block5_conv1/Relu:0' shape=(1, 25, 25, 512) dtype=float32>, <tf.Tensor 'functional_1/block5_conv4/Relu:0' shape=(1, 25, 25, 512) dtype=float32>]**
**a_G Original**
**Tensor("functional_1/block1_conv1/Relu:0", shape=(1, 400, 400, 64), dtype=float32)**
**a_S Style**
**(64, 160000)**
**a_G Style**
**(64, 160000)**
**Tensor("transpose_1:0", shape=(64, 160000), dtype=float32)**
**a_G Original**
**Tensor("functional_1/block2_conv1/Relu:0", shape=(1, 200, 200, 128), dtype=float32)**
**a_S Style**
**(128, 40000)**
**a_G Style**
**(128, 40000)**
**Tensor("transpose_5:0", shape=(128, 40000), dtype=float32)**
**a_G Original**
**Tensor("functional_1/block3_conv1/Relu:0", shape=(1, 100, 100, 256), dtype=float32)**
**a_S Style**
**(256, 10000)**
**a_G Style**
**(256, 10000)**
**Tensor("transpose_9:0", shape=(256, 10000), dtype=float32)**
**a_G Original**
**Tensor("functional_1/block4_conv1/Relu:0", shape=(1, 50, 50, 512), dtype=float32)**
**a_S Style**
**(512, 2500)**
**a_G Style**
**(512, 2500)**
**Tensor("transpose_13:0", shape=(512, 2500), dtype=float32)**
**a_G Original**
**Tensor("functional_1/block5_conv1/Relu:0", shape=(1, 25, 25, 512), dtype=float32)**
**a_S Style**
**(512, 625)**
**a_G Style**
**(512, 625)**
**Tensor("transpose_17:0", shape=(512, 625), dtype=float32)**
**J_style**
**Tensor("add_4:0", shape=(), dtype=float32)**
**a_C**
**(1, 625, 512)**
**a_G**
**(1, 625, 512)**
**J_content**
**Tensor("mul_10:0", shape=(), dtype=float32)**
**tf.Tensor(25629.055, shape=(), dtype=float32)**
**tf.Tensor(13932.012, shape=(), dtype=float32)**
**---------------------------------------------------------------------------**
**AssertionError                            Traceback (most recent call last)**
**<ipython-input-27-c4cbeb243f35> in <module>**
**     10 J2 = train_step(generated_image)**
**     11 print(J2)**
**---> 12 assert np.isclose(J2, 17812.627, rtol=0.05), f"Unexpected cost for epoch 1: {J2} != {17735.512}"**
**     13 **
**     14 print("\033[92mAll tests passed")**

**AssertionError: Unexpected cost for epoch 1: 13932.01171875 != 17735.512**

Any hints on what the shape mismatch is?

Thanks!

That 13932.012 value at epoch 1 is what you get if you missed the fact that they changed the learning rate on the optimizer call from 0.03 to 0.01. This happened in the most recent update to the notebook. You must have the update, since it includes different cost values in the assertions, but when you copied over your solutions you must have copied too much. You should only copy the code you added in the “START HERE/END HERE” blocks.

Note that there is a bug in the instructions in that they still tell you to use 0.03. A bug will be filed about that in 3 … 2 … 1 …, but who knows when the fix will be forthcoming.

Paul, you’re a rockstar! That was it. I followed the directions and updated the learning rate at the beginning of the UNQ_C5 function. Reverted to 0.01 and it worked like a charm.

Thanks, again!