I’m working on the portion of the first assignment of the course in which the get_disc_loss() function is being defined. I’m confused by it. Before this portion of the assignment, the variable gen has been created, which is an instantiation of the Generator object that was created earlier. The method .forward() in Generator contains the docstring
Function for completing a forward pass of the generator: Given a noise tensor,
returns generated images.
And again in the get_disc_loss() function docstring contains
…gen: the generator model, which returns an image given z-dimensional noise…
However, the first test case doesn’t pass in the gen object, it passes in functions for both gen and criterion. So I can’t call the .forward() method of the Generator object to produce the images, nor can I call .detach() as the instructions say I must.
I thought I understood what I was supposed to be doing, but I can’t pass the test cases and reading them has made me confused about what is actually expected here.
Update: I was able to pass the tests and move on. I have some suggestions though.
The docstring in the get_disc_loss() function erroneously says “Don’t forget to detach the generator”, but it is the generated images returned by the generator that need to be detached. Trying to detach the generator itself will cause an error to be thrown.
The instruction to “Calculate the discriminator’s loss by averaging the real and fake loss” is ambiguous for someone new to PyTorch. How the average is calculated has to be done in a very specific way. How to do this was not immediately obvious, and my solution seems very unsatisfying. I think there is surely a best practice for doing this that I’m not aware of that I wish the lesson would have included.
The Generator class clearly defines a .forward() method and the docstring says that it is the .forward() method that is called to generate new images; however, the way in which the test cases are written for this exercise make it impossible to actually use that .forward() method in get_disc_loss() and still pass the test cases. We instead are forced to treat the gen argument as a callable object, but it is not mentioned anywhere in the lesson that PyTorch has implemented this ability for its layers. I thought that Layers-as-callable-objects was a poorly received design choice unique to Tensorflow.
The test cases make multiple assertions, but there are no assertion error strings provided. As a result, we are left to read the test cases and try to figure out what went wrong.
This first assignment took me longer than any assignment in the entire Deep Learning specialization due to the confusion created by the above.
Good suggestions, @Alexander_Valarus! I’ll pass along to the staff developers.
I esp. like the suggestion to provide specific mention that forward() is the default method that gets executed when the generator instance is called. This is touched on and demonstrated in the Intro to PyTorch lab, and also demonstrated in this lab’s test_generator() code, but it’s an important point that could seem mysterious and confusing to students who are new to Python or PyTorch, so it’d be helpful to clearly explain it in this first lab.
Just a heads-up that the phrase, “Detach the generator”, is fairly commonly used, even though technically not exactly accurate, so you may also run into it outside of this course. I was glad to see that the instructions for get_disc_loss at least says, “you will need to call .detach() on the generator result …” (Still a good suggestion to change the comment in the code)
I need a little more clarification on one of your points in order to pass it along. Can you explain a little more about what you mean by, “How the average is calculated has to be done in a very specific way.” I can’t think of anything particularly special you should need to do here. For example, it should work to just add the two numbers together and divide by 2 in order to get the average.
I can’t think of anything particularly special you should need to do here. For example, it should work to just add the two numbers together and divide by 2 in order to get the average.
That does work, but it feels unofficial somehow.
Adding the two vectors of losses then dividing by two is adding the element-wise pairs of each loss vector, then dividing each element in the resulting vector by two. The result is still a vector, not a scalar loss. The test cases take the mean of that vector and checks whether it is the correct scalar loss. It seems weird that we would be expected to return a vector of losses so that the grader can finish up and take the average of it.
The most satisfying solution I could find is torch.stack((fake_loss, real_loss)).mean(), which takes the mean over all training examples in both vectors and returns the scalar average loss.
The comments for get_disc_loss() do say it should return a tensor scalar, and just adding the two losses together and dividing by two only returns a tensor scalar if those two losses are tensor scalars. They will be tensor scalars in the “real” case where we have criterion=BCEWithLogitsLoss(), but they won’t be for some of the criterion functions used in the test cases.
So, it seems the thing we should change in the assignment is to either change the instructions to say the function should return a tensor with the same shape as the return value from criterion, OR we should change the test cases to only use criterion that return tensor scalars and update the comment for criterion to say it will return a tensor scalar.
Thanks for the good insights, @Alexander_Valarus. I now get it well enough to explain the issue.
BTW, as far as the test case taking the mean of the result, I wouldn’t worry about that - it’s just specific to the test - a way to easily verify the expected value.