Questions regarding C1W1 assignment

Hello,

Thank you for this great course. I learnt a lot during this first course. I have some questions regarding C1W1 assignment:

1_ why the final layer of the Discriminator is only made of Linear and no sigmoid on top of the Linear ?

2_ Why The generator have a Relu layer instead of LeakyRelu. Isn’t it prone to dying relu problem as you used in the discriminator ?

3_ what motives the use of BCEWithLogitsLoss() instead of BCELoss()?

And finally a question that bother me a lot is how to evaluate this model ? Is there any evaluation metric ?

Regarding question 1. and 3., here is the doc for BCEWithLogitsLoss from PyTorch Docs,

This loss (BCEWithLogitsLoss) combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

Thus, the reason why the final layer of the Discriminator is only made of Linear is that Sigmoid is “merged” in Loss to provide some optimizations.

Next time, you can turn to PyTorch Documentation to investigate the problem by yourself at first, which is obviously the purpose of providing detailed docs.

Regarding question 2, there is a tradeoff between ReLU and LeakyReLU. Of course, LeakyReLU avoids the dead ReLU problem, but ReLU computes faster and may have some other advantages, such as sort of regularization. Based on what I know, LeakyReLU and ReLU are all widely used in all fields.

And finally the question 4, there are definitely some metrics to evaluate GAN. Fréchet inception distance (FID) and Inception score (IS) are all famous options at your hand. You could google them to get more information. By the way, they will be taught in the following courses.

2 Likes

Answering question 2.

Dying ReLU is caused by :

  1. High learning rate
  2. large negative bias

If we use ReLU then we can avoid these problems by:

  1. Using a smaller learning rate
  2. Modifying the initialization procedure: Some initialization procedures may cause dying gradient problems due to bad local minima. For ex. He Initialization, in such case modifying the initialization of weights procedure, can be done. Randomized Asymmetric Initialization can be helpful to solve this.

Or you can simply use LeakyReLU. As long as you avoid the problem of dying ReLU, you can use ReLU or LeakyReLU.

Hope this answers your question.

2 Likes

Thank you for your answers.

Hello Mansour,
The answer for the ques 3 is using BCEWithLogitssLoss() helps to apply the activation function internally i.e BCE loss + sigmoid function but if we are using BCELoss() then user has to manually add the activation (sigmoid).

1 Like

Thank you very much Dear @dheerajreddy3108 , very clear now.

Yes Dear @chinmay-d , It does. Thank you very much.

2 Likes

The code crashes at the show_tensor_images line.
If I comment it out, it does not crash, but provides the wrong numbers for the losses.

Any idea how to debug this?

Thank you for your assistance and for this fantastic course!
### Visualization code ###
if cur_step % display_step == 0 and cur_step > 0:
print(f"Epoch {epoch}, step {cur_step}: Generator loss: {mean_generator_loss}, discriminator loss: {mean_discriminator_loss}")
fake_noise = get_noise(cur_batch_size, z_dim, device=‘cuda’)
fake = gen(fake_noise)
show_tensor_images(fake) ****
show_tensor_images(real)
mean_generator_loss = 0
mean_discriminator_loss = 0
cur_step += 1

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value=’’)))

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value=’’)))
Epoch 1, step 500: Generator loss: 1.3957562952041633, discriminator loss: 0.41824104803800605
in show_tensor_images(image_tensor, num_images, size)
10
11 def show_tensor_images(image_tensor, num_images=25, size=(1, 28, 28)):
—> 12 image_unflat = image_tensor.detach().gpu().view(-1, *size)
13 image_grid = make_grid(image_unflat[:num_images], nrow=5)
14 plt.imshow(image_grid.permute(1, 2, 0).squeeze())

AttributeError: ‘Tensor’ object has no attribute ‘gpu’

https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

1 Like