Week 1 Assignment: RuntimeError

GuyMerf · February 15, 2022, 3:26am

Hi,
please i am stuck. not able to find the issue here.

    noise_vector = get_noise(num_images, z_dim, device)
    fake = gen(noise_vector).detach()
    predict_fake = disc(fake)
    loss_fake = criterion(predict_fake, torch.zeros(num_images, z_dim))
    predict_real = disc(real)
    loss_real = criterion(predict_real, real)
    disc_loss = (loss_fake + loss_real)/2.0

here is the error. i am not familiar with Pytorch

RuntimeError                              Traceback (most recent call last)
<ipython-input-45-97fd1ea584a3> in <module>
     71             break
     72 
---> 73 test_disc_reasonable()
     74 test_disc_loss()
     75 print("Success!")

<ipython-input-45-97fd1ea584a3> in test_disc_reasonable(num_images)
     23     criterion = torch.mul # Multiply
     24     real = torch.zeros(num_images, 10)
---> 25     assert torch.all(torch.abs(get_disc_loss(gen, disc, criterion, real, num_images, z_dim, 'cpu').mean() - 5) < 1e-5)
     26 
     27     gen = torch.ones_like

<ipython-input-44-1c34c7deaebb> in get_disc_loss(gen, disc, criterion, real, num_images, z_dim, device)
     37     predict_real = disc(real)
     38     loss_real = criterion(predict_real, real)
---> 39     disc_loss = (loss_fake + loss_real)/2.0
     40     #### END CODE HERE ####
     41     return disc_loss

RuntimeError: The size of tensor a (64) must match the size of tensor b (10) at non-singleton dimension 1

Elemento · February 15, 2022, 5:26am

Hey @GuyMerf,
According to me, the error is in the inputs of the criterion function, as a result of which, the outputs have different dimensions as well. If you consider the criterion, then it is basically finding the loss with the help of predicted labels and ground labels.

Here, the predicted labels are predict_fake and predict_real. Now, if we think about the ground labels, we want all the predictions corresponding to fake images as 0s, and hence, the true label for fake images can be torch.zeros_like(predict_fake). The best part about this is that you don’t have to worry about the dimensions at all. Similarly, for the real images, the true label can be torch.ones_like(predict_real).

In summary, try out this code:

    predict_fake = disc(fake)
    loss_fake = criterion(predict_fake, torch.zeros_like(predict_fake))
    predict_real = disc(real)
    loss_real = criterion(predict_real, torch.ones_like(predict_real))
    disc_loss = (loss_fake + loss_real) / 2.0

GuyMerf · February 15, 2022, 1:45pm

Hi @Elemento,
Thanks! it works. I think I was not getting the root of that part and you’ve explained it to me well.
Thank you once again!

Elemento · February 15, 2022, 1:51pm

Happy to help @GuyMerf

GuyMerf · February 15, 2022, 3:30pm

Please @Elemento why do we have to detach the generator to calculate the disc loss and why not for the generator’s loss? I am not sure to understand that part.

Elemento · February 15, 2022, 4:25pm

The reason here is very simple. First, allow me to state a very trivial fact, i.e., we are finding the generator loss to update the weights of the generator and the discriminator loss to update the weights of the discriminator.

Once we have established this fact, the question is answered itself. When we are calculating the disc loss, we want to update the discriminator only, and hence, we are using a tensor that is not attached to the computation graph (something which is the job of the detach method). But when we are calculating the generator’s loss, we want to update the generator, and at this point, if you use the detach method, then the generator’s weights won’t update, and the generator won’t train.

In summary, we use the detach method when we wanted to make sure that the generator’s weights are not updated.

Regards,
Elemento

paulinpaloalto · February 15, 2022, 5:08pm

I think it’s worth going into a little more detail there. Here’s another thread that discusses this issue.

Note that the situation is fundamentally asymmetric:

When we train the generator, we need the gradients for the discriminator, since the loss is defined by the output of the discriminator, right? So by the Chain Rule, the generator gradients contain the discriminator gradients as factors. But then we are careful not to apply those gradients to the discriminator: we only apply the gradients for the generator in that case. Then we always discard any previous gradients at the beginning of any training cycle.

In the case of training the discriminator, the gradients do not include the generator gradients, so we literally don’t need them. We could compute them and they will be thrown away, so it’s not a correctness issue. It’s a performance issue: computing gradients is expensive (doing finite differences), so why do it in the case you know you don’t need them? Why waste the cpu and memory when you’re just going to throw the gradients away?

Elemento · February 15, 2022, 6:24pm

Thanks a lot, @paulinpaloalto. I also learned new things from your answer

Topic		Replies	Views
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn Build Basic Generative Adversarial Networks week-1	5	1090	November 17, 2021
C1 w1 - unq_c6: Build Basic Generative Adversarial Networks week-1	3	475	June 29, 2023
Expect all tensors to be on the same device Build Basic Generative Adversarial Networks week-1	5	647	June 8, 2023
At a loss with get_disc_loss Build Basic Generative Adversarial Networks week-1	1	588	October 28, 2022
# UNQ_C4 Dimension mismatch error Build Basic Generative Adversarial Networks week-4	2	491	June 28, 2023

Week 1 Assignment: RuntimeError

Related topics