Neural style transfer programming exercise, compute_style_cost()

Reinier_de_Valk · November 21, 2022, 10:35am

I am confused about the for-loop in compute_style_cost(). There are three things that I would like to have clarified:

style_image_output and generated_image_output are our model, so the hidden layer activations for all layers.
a_S and a_G are the hidden layer activations for all the layers up to the last, so they are 5-dimensional. The descriptions above them are unclear, but this must be the case because compute_layer_style_cost() takes two 4D tensors (the S and G activations for one hidden layer) as input.
When we loop through the layers using zip(), the iterator stops when the shortest input iterable is exhausted. Here, that’s STYLE_LAYERS, so i will have values 0, 1, 2, 3, 4. But if that same i is then used to get the activations a_S[i] and a_G[i], doesn’t that mean that simply the first five layers from a_S and a_G are taken – and not the layers as specified in STYLE_LAYERS?

paulinpaloalto · November 21, 2022, 5:55pm

You are obviously putting a lot of time and energy into this assignment. But to be fair, the code is not so clear and has lots of “layers” to it (pun intended).

Here’s the text of section 5.5.2:

5.5.2 - Compute the Style image Encoding (a_S)

The code below sets a_S to be the tensor giving the hidden layer activation for STYLE_LAYERS using our style image.

So this statement from your post is not correct:

No, they are not. They are subsets determined by STYLE_LAYERS, so they each will have 6 layers. The reason it is 6 instead of 5, is that the vgg_model_outputs function has been defined to add one extra “content layer”:

content_layer = [('block5_conv4', 1)]

vgg_model_outputs = get_layer_outputs(vgg, STYLE_LAYERS + content_layer)

If you look carefully at the code in compute_style_cost, you’ll see it discard the last layer:

    # Set a_S to be the hidden layer activation from the layer we have selected.
    # The last element of the array contains the content layer image, which must not be used.
    a_S = style_image_output[:-1]

So you end up with 5 and everything matches.

To convince yourself how this is working, you can actually add some instrumentation to the cell that creates the global a_S value that is referenced in train_step:

# Assign the input of the model to be the "style" image 
preprocessed_style =  tf.Variable(tf.image.convert_image_dtype(style_image, tf.float32))
a_S = vgg_model_outputs(preprocessed_style)

# Paul addition:
print(f"len a_S {len(a_S)}")
for ii in range(len(a_S)):
    print(f"shape a_S[{ii}] = {a_S[ii].get_shape()}")

Try running that cell with the added print code and watch what happens. Maybe it’s worth harping on the “meta” point here: if you are not sure what’s happening in the code, it’s time to start adding some instrumentation to see what is going on. You don’t have to wonder: you can actually add code to see it.

Reinier_de_Valk · November 23, 2022, 10:59am

No, they are not. They are subsets determined by STYLE_LAYERS, so they each will have 6 layers. The reason it is 6 instead of 5, is that the vgg_model_outputs function has been defined to add one extra “content layer”:

Aha – now it all makes much more sense! I see that the issue here is that this is explained too long after the code is introduced in compute_style_cost(). I didn’t go beyond this function yet, I wanted to understand first what’s going on here, expecting that I had all the information I needed. That information in section 5.4 really should have gone into the docstring of compute_style_cost(), which is now a pretty useless: “our tensorflow model”. Maybe this is something that you can suggest as a change?

And a detail:

the vgg_model_outputs function has been defined to add one extra “content layer”

But vgg_model_outputs is not a function, right? Did you mean the get_layer_outputs() function?

Maybe it’s worth harping on the “meta” point here: if you are not sure what’s happening in the code, it’s time to start adding some instrumentation to see what is going on. You don’t have to wonder: you can actually add code to see it.

Yes, this is what I usually do. I did something similar inside compute_style_cost(), but then noticed that it is actually never called in the notebook (why not?). So that’s when I gave up and wrote my message.

You are obviously putting a lot of time and energy into this assignment.

It seems so… But I’m here to learn, and want to understand. I’m sorry that you’re the one who gets all my questions. I really appreciate the effort in your replies!

paulinpaloalto · November 23, 2022, 5:29pm

But here is the cell that is just below the cell that creates vgg_model_output:

content_target = vgg_model_outputs(content_image)  # Content encoder
style_targets = vgg_model_outputs(style_image)     # Style enconder

So clearly it is a function. That means that get_layer_outputs is a function that returns a function as its return value. We’ve seen plenty of examples of that in Course 4. E.g. the Keras Sequential function is a function that returns a function. As are all the Keras Layer instances.

It is called later, but by code you haven’t written yet. Please see the train_step function later in the notebook: that needs to call compute_content_cost and compute_style_cost.

Reinier_de_Valk · November 24, 2022, 11:46am

So clearly it is a function. That means that get_layer_outputs is a function that returns a function as its return value. We’ve seen plenty of examples of that in Course 4. E.g. the Keras Sequential function is a function that returns a function. As are all the Keras Layer instances.

It is. No idea why I didn’t see that when I wrote my question, sorry.

I managed to get through the whole notebook now, but have one last question: what is the function of content_target and style_targets, as defined in the cell below the one in which vgg_model_outputs is defined?

content_target = vgg_model_outputs(content_image)  # Content encoder
style_targets = vgg_model_outputs(style_image)     # Style encoder

They aren’t used anywhere – is this cell just an example to show how to use vgg_model_outputs (to which actually the preprocessed images should be given, as done when creating a_C and a_S)?

paulinpaloalto · November 24, 2022, 5:01pm

It’s great to hear that you got all the way through the assignment successfully. The waters are pretty deep and it’s the last assignment in the course, so congratulations are in order!

Yes, I also noticed those those variables are never referenced and wondered whether it was simply a mistake. I guess your theory that they were just showing an example of how to use vgg_model_outputs is probably the reason, but a comment to that effect would seem appropriate. I’ll file a request about that and a few other typos I found in the notebook. I notice you fixed one of them on that line: “enconder” instead of “encoder”. Also why is one target singular and the other plural? But now we’re down to serious nitpicking!

Onward!

Reinier_de_Valk · November 26, 2022, 9:48am

OK, great – final mystery solved!

Also why is one target singular and the other plural? But now we’re down to serious nitpicking!

I noticed the singular-plural difference here too, and I think this is not nitpicking! Variable names should be clear an unambiguous, so in this case they should be either both singular or both plural. I would say plural is preferred, as the function that returns them is also called vgg_model_outputs(), and because there actually are multiple (six) outputs.

Another inconsistency that I found confusing has to do with a_S and a_G, which represent one layer’s activation in compute_layer_style_cost(), but later, in section 5.5, they (and also a_C) represent six layers’ activations.

so congratulations are in order!

Thank you! Your help was very valuable in getting there