Week 4: Controllable Generation Assignment

We are initializing the feature here.
1,2: But how will our model come to know that this is BigLips or that is blond hair etc features?
and which layers is all this training part happening?
3: We have to extract a particular feature from a particular image or we can train multiple features from the same image.

I know this is a basic question. But I got a little confused.

1 Like

Hey @starboy, When I was taking the course, I had a similar query, and I discovered a credible explanation that satisfied me. But now that we have this platform and a wonderful community to assist us in resolving our concerns, I believe you and I will be able to receive a better answer. So, don’t stop with my explanation; keep looking.

My understanding is something like this:

Controllable Generation is where we tweak the noise vector in order to get output with different features, right? This is accomplished by using a basic stochastic gradient ascent to build a noise vector that creates more of a certain characteristic, such as a noise vector that solely modifies hair colour or hairstyle. That can be seen directly after the code snippet you supplied in your query (see screenshot below).

Now, where does it happen? Where do we find this actually happening in the training timeline? According to Sharon (the instructor) in the Course 1, Week 4, 4th Video and I quote:

Conditional generation, allows you to specify what class you want to a very different type of thing. For this, you need to have a label data set and implemented during training typically. You probably don’t want to label every hair length value, so, controllable generation will do that for you and it’s more about finding directions of the features you want. That can happen after training. Of course, controllable generation, you will sometimes also see it happening during training as well. To help nudge the model in a direction where it’s easier to control. Finally, as you just learned, controllable generation works by tweaking that input noise vector Z that’s fed into the generator, while with conditional generation, you have to pass additional information representing the class that you want appended to that noise vector.

So, what do we mean by this? We mean that this adjusting may be done both after and during training.

My understanding ends here; I hope someone can explain it more thoroughly than I can since I, too, am curious about what and how it happens behind the hood.

Hope it helps :grinning_face_with_smiling_eyes:



You know, I don’t if its a bit off topic, you yourself can try to see what your input noise dimensional mean. How you might ask? Well, just train a GAN. After training try generating a new datapoint(image). Now fix the noise vector from which this datapoint was generated. Now, for your understanding just try changing one of the dimension of the noise vector. Suppose you had 64 dimensional noise. Try indexing to the first one and change it to certain range of values (keep all the other dimensions constant) and generate the picture. You will see something change up a bit. It might be that the lips is getting bigger or the hair color is changing. With such an experiment you would know manually what dimension maps to what setting of an image.

Hope I can give you a different edge!

1 Like