Hi,
I was able to pass this week assignment w/o deep understanding of what goes behind the scene (probably because my lack of knowledge with PyTorch and the way tensors are working) .
So, two questions regarding the assignment which I hope will help me:
-
forward
method is not called directly. When it is called then?
- What are the strings attached between the loss tensors, the optimizer and the gen/disc parameters?
Thanks!
Yes, there is a lot to learn if you are new to PyTorch. They did provide a PyTorch optional tutorial earlier in Week 1, but I donāt remember if they cover the issues you raise.
For the question about the forward()
method, you should read the pytorch doc for the class nn.Module, of which all our networks are subclasses. They specifically say that you should invoke the actual model instance as a function instead of directly calling the forward()
method. Have a look at what it says under the forward()
method in that document.
For the second question, Iām not sure what you mean by āstrings attachedā, but note that we have to write the loss functions and they are defined in terms of the output of the generator and discriminator and the criterion functions. So the compute graphs are defined by that. PyTorch (like TF and other frameworks) computes gradients for you automatically, when you run in ātrainingā mode, but there is some additional logic you need to add to actually apply the gradients appropriately. Have a look at the training code: they give you some worked examples and then you have to copy those for the parts you need to write. Look at the zero_grad()
, step()
and backward()
methods. That is where the action happens of applying (or not) the automatically computed gradients.
There are also some subtleties and complexities specific to GANs: we train the generator and the discriminator separately and alternately, but training the generator involves the gradients of the discriminator by definition. We need to be careful not to actually apply those gradients when we are training the generator. Hereās a thread which discusses that in more detail.
The pytorch documentation is good and there is an active online support community. If youād like a more thorough intro about back prop in pytorch, you can google up some overall explanations on how gradients work in pytorch.
when you call gen(noise)
and disc(images)
, forward(...)
is called under the hood.
I think it utilises the python feature of ācallableā, i.e. nn.Module
implements __caller__
as something like
def __caller__(self, ...):
self.forward(...)
Keras has similar grammer.