Questions about Art_Generation_with_Neural_Style_Transfer

Hi friend and mentor,

I already finished the coding homework, but to be honest, I did the code, doesn’t means I truly understood all. I have some questions as below, please directly answer yes or no firstly. thank you for your time.

from this post, i copied the pic as below

this output also can be viewed in my local as below:

Q1. in the section of Excercise 1 - compute_content_cost , we have [-1] in a_C = content_output[-1] a_G = generated_output[-1] is because we need the last content layer (layer 5 in this case). yes or no?

Q2. in the section of Exercise 4 - compute_style_cost, we have

    # Set a_S to be the hidden layer activation from the layer we have selected.
    # The last element of the array contains the content layer image, which must not be used.
    a_S = style_image_output[:-1]

    # Set a_G to be the output of the choosen hidden layers.
    # The last element of the list contains the content layer image which must not be used.
    a_G = generated_image_output[:-1]

this [:-1] took all style layers (layer 0 to 4 at this case), but NOT last layer (this case is layer 5). Yes or no?

Q3. The true NN which is doing this Neural style transfer only has 6 layers in this case, like the black pic from my local above. Yes or No? in other words, we loaded the trained vgg (22 layers), but only 6 out of those are picked and used in this case.

Q4, during the training, we are NOT updating the parameters in the NN like before. The parameters in NN never changes anymore, but updating directly on the pixels, Yes or No ?

Q5, this is not a yes or no question, like the pic above, and the same thing was mentioned in the class: Prof said we pick kinda one middle layer for content right? but this 'block5_conv4'… is very very end of the NN, right? I thought middle meant somewhere around block 3_conv4.

Thanks for your time again!

  1. Yes.
  2. Yes.
  3. Cannot answer. What do you mean by “the true NN”?
  4. Cannot answer. I don’t know which NN you are referring to.
  5. Please give a reference to the lecture and time mark.
1 Like

I am talking about the code, less about the video here.

for Q3. the “true NN” means the NN created the style transferred picture. I was asking, which NN is the one used to generate the final style transferred picture. In the code, we first load the trained vgg NN (22 layers) , then we made another new NN by picking 6 layers from the vgg. My understanding is, this 6 layers NN is the one do the style transfer job, not the original vgg. Yes or No?

for Q4, my understanding is… none of NN parameters are updated. During the training, the pixels are updated directly. maybe I am wrong.

for Q5, still the code. code picked the almost final layers. but the video or this homework both mentioned we pick the middle layer for content cost.

thank you!

For Q3, what we are doing is not really a “neural network” here. It’s a pretty interesting and creative way to apply the technology. We use the pretrained VGG network, which is trained as in image classifier. Then what we are doing is not constructing a different neural network: it is just that we extract the activations produced by VGG from some internal layers and use that to evaluate how our synthetic image is doing. The cost functions we use to express our goals for the image are based on the outputs of the VGG internal layers, but we do not retrain VGG. We aren’t really training a network, but applying the gradients of the cost function directly to the generated image that we are creating. So it’s what you say later in your post for Q4: we are applying the gradients to the pixels directly. Notice that we don’t call anywhere here. We manually use gradient tape and apply the gradients to the image.

For Q5, I think you’re right that when they say “middle layer”, it’s maybe a bit misleading. It’s “middle” in the sense that it’s not literally the output layer of VGG, but it’s fairly close to the end of VGG for the “content” case. I assume that they arrived at this choice by doing the type of experimentation that they refer to in the text of the notebook. At several points they suggest that you can try different choices for both the style layers and the content layer and see how that changes the results.


Paul, thank you!

Could you please also share your thoughts about my Q1, Q2 ?

For Q3. I think my understanding is correct. We don’t truly train a NN at all, we just pick some layers from a trained NN and then we updated the picture directly. That’s why there was no fit function. However, I think we do construct an NN. because we have the code below. It’s just picked some layers and made a new one. Then use this new NN to do style transfer work.

def get_layer_outputs(vgg, layer_names):
    """ Creates a vgg model that returns a list of intermediate output values."""
    outputs = [vgg.get_layer(layer[0]).output for layer in layer_names]

    model = tf.keras.Model([vgg.input], outputs)
    return model

For Q5. got it. thank you!

Tom already answered your Q1 and Q2 and I thought we were only allowed to say “Yes” or “No” on those, right? :joy:

For Q3, yes, we create an object that is a Keras Model instance, but that just means that it’s a function that takes an input and produces an output. What it actually does internally looks nothing like a Neural Network: it takes the VGG output and returns an array of extracted layer activations. It does no actual processing on any of the inputs, right? There is no “feed forward” happening there.

Update: Maybe I was too hasty there: the model we construct in that function does actually evaluate VGG internally, so that is a NN happening there. But the output of it is the various selected layer activations and not really the classification.


@TMosh @paulinpaloalto you guys are awesome!

wow. of course you can share more. I put yes or no there, just because i don’t want to bother mentors too much. haha. So, I wanted to make it easy.

Sure, I was joking. But Q1 and Q2 are pretty clear and you had the right interpretation, so I think Tom covered them and there’s not really a lot more to say.


Actually after thinking a bit more, maybe I was wrong about what happens in the model created by the get_layer_outputs function. It is a lot of action expressed in not very many lines of code, but what it is doing internally is actually to invoke VGG with the image as input. So there is a Neural Network happening there. It’s just that it doesn’t really look the way we normally expect and it’s not the classification output that is our goal there and (as mentioned earlier) we aren’t really training VGG.


If you say you only want a Yes/No answer, that’s all you’re going to get. It limits the amount of information the mentors might offer.

1 Like

thanks again for your help.