Generated Image Explanation

for generated image In NST Do we initialize it randomly , or we apply a noisy filter on the content image and choose it as generated image ?

No, sorry, it’s not really either of those things, although maybe you could consider it a variation on the “noisy filter” idea. It’s actually quite a bit more complicated and interesting than that. It uses some of the internal layers of a pretrained image recognition network (VGG in this case) to extract aspects of the “style” image and then merge or apply those to the content image to create the “generated image”. If you missed all this, it would probably be a good idea to watch the lectures again and then do the Neural Style Transfer assignment to see in detail how this all works.

I think the OP is closer on this one. At least the older version of the notebook I have access to contains this passage:

Now, we initialize the “generated” image as a noisy image created from the content_image. By initializing the pixels of the generated image to be mostly noise but still slightly correlated with the content image, this will help the content of the “generated” image more rapidly match the content of the “content” image.

Unless the updated notebook changed approach?

As you correctly suspect, they have not fundamentally changed the approach. I guess I need to go back and understand this in more detail, but my point was that the noisy filter is just the starting point, right? That’s how they initialize the generated image, but that’s not the end of the story. Then they do some form of “training” to “refine” the style transfer. At least that was my (clearly incomplete) understanding. Perhaps that just amounts to tweaking the “noise” until you get an output that looks pleasing (obviously a subjective assessment), meaning that it’s still just a “noisy filter” applied to the content image.

Yes. It is initialized as content image + noise (not just random noise). I thought that was the OP’s question. Then afterwards nothing random about it as the loss is iteratively minimized to move towards style as you described.

Ah, sorry, I just misinterpreted the question. I thought the assertion was that the noisy filter was all that was being done to create the generated image:

A bit ambiguous and I probably jumped to the wrong conclusion. In any case, I think we’ve covered it either way, so it’s all good. Thanks!

1 Like