Conceptual Question on Gradient Ascent

When I was following how to built a controllable GAN.
I was confused as to why in order to update the noise we used gradient ascent. Like how does gradient ascent rather than descent strengthen a feature.
What I understood is the following we trained a classifier which has 40 output nodes for different features so that when seeing a new image of a celeb it would be able to give high values to nodes that correspond to the features on that celebs face. Then we run our noise vector through our trained Generator that already has been trained to generate faces and what happens is if you perhaps want the image to have sunglasses then you run the noise vector on the gen and measure how miuch the classifier classified the image as having glasses. After that I something happens that I dont quite understand. Here I dont understand how loss is calculated and why it is ascended to make glasses more pronounced in the generated image. In other words, Like how do you go from calculating the classifiers output to finding gradients of noise (dont you need loss to find gradients for noise anfd if so what is tge loss) and then gradient ascent (why ascent instead of descent). I know this process works I just dont understand why.

I would appreciate if you could correct me on my understanding.

Sarthak Jain

Hello Sarthak,
Please don’t get confused by the terms ascent and descent. In simple terms, if we’re trying to maximize the objective function, we are performing gradient ascent on the objective function. When minimizing the same, as in vanilla NNs, we perform gradient descent.

This article might clear your doubt to some extent. Keeping this basic thought in mind, if you revisit the lecture on this, hope it will clear your understanding on this.

@Sarthak_Jain1 Hi Sarthak,

When we want to add a feature to an image, we are trying to maximize the score wrt the noise parameters, so we perform Gradient Ascent whereas when we try to minimize the score [ in case of removing a feature from an image] , we perform Gradient Descent.

Why Maximize upon addition of a feature? Well, that is the convention the instructors have chosen for this assignment. So, if you note, the gradient penalty has been added with a ‘-’ (minus) sign to modified objective function which serves as a regularization term.

Hope it helps!! :slight_smile: