[C2W2_Assignment] Confused about the optimizer for the parameter update

In the Feature Correlation section of the notebook for C2W2_Assignment, I’m unable to figure out why the opt can be used to update the noise vector. Specifically, to add more of a “Male” feature, the optimizer is defined as:

opt = torch.optim.Adam(classifier.parameters(), lr=0.01)

In this way, it should calculate the gradient of the objective function (that’s the classifications[:, target_indices].mean() per my understanding) in the direction of the vector returned by classifier.parameters().

But instead, the code uses the gradient based on new_noise:

new_noise.data += new_noise.grad / grad_steps

Though in the code new_noise is set to require gradient, I still couldn’t grasp the idea of how the optimizer can calculate gradient of new_noise while it is defined in such way above. Could some one help me to move out of the local minima of misunderstanding?

Many thanks!

Hi TRAN_KHANH1,
If I understand your question correctly, then I guess that you are confused with the separate operations that are happening in the code.

The optimizer “opt”, is used to update the parameters of the classifier as opt is given as (opt = torch.optim.Adam(classifier. parameters(), lr=0.01)), not the generator’s input noise vector, “new_noise”. Here it is only used to zero out(assigns it to zero) the gradients w.r.t classifier’s parameters before every iteration of the “for” loop.

We have to update the parameters of both the discriminator and the generator in general. But opt used here doesn’t do any as we are using a pre-trained classifier here. “opt” is used only for the purpose stated in the above paragraph.

fake = gen(new_noise)
fake_image_history += [fake]
classifications = classifier(fake)
classification_history += [classifications.cpu().detach()]
fake_classes = classifications[:, target_indices].mean()
fake_classes.backward()
new_noise.data += new_noise.grad / grad_steps

Let me try to explain it to you, step by step (regarding what’s happening in this code) .

  • First we are feeding our noise vector to the generator to get a generated or fake image.
  • We are storing it in the list (to track the history of fake images that are generated by the generator during every iteration).
  • Our generated image is then passed onto the discriminator/classifier and we get back its classifications with respect to all features. We are then storing it in the list.
  • Now, classifications will be a matrix of shape (batch_size,num_of_features) --each column represents classifications with respect to corresponding_features. From this, we are just taking our target class’s classifications and getting the average of all such classifications (all generated images).
  • fake_images.backward() is called to update all the parameters based on the gradient of “fake_images” with respect to all the parameters in the computation graph.
    Now here comes the interesting part :
    —> “new_noise” is the input to our generator when generates the fake image.
    —> “fake” then becomes the input to the classifier
    Hence, the “new_noise” is also a node of the computation graph ==> Thus fake_images.backward() will also compute the gradient of “fake_images” with respect to new_noise.
  • Now new_noise.grad will return the gradient of “fake_images” with respect to “new_noise” computed during the “backward()”. Now we will update the new_noise according to our requirements.

Hope this helps you to move out of the local minima :grin: . If not, we are always here to help.
And we are extremely sorry for the late response,your post might have got missed in the lot.
Happy learning!
Regards,
Nithin

1 Like