Questions regarding C1W1 assignment

Mansour · August 21, 2021, 11:56am

Hello,

Thank you for this great course. I learnt a lot during this first course. I have some questions regarding C1W1 assignment:

1_ why the final layer of the Discriminator is only made of Linear and no sigmoid on top of the Linear ?

2_ Why The generator have a Relu layer instead of LeakyRelu. Isn’t it prone to dying relu problem as you used in the discriminator ?

3_ what motives the use of BCEWithLogitsLoss() instead of BCELoss()?

And finally a question that bother me a lot is how to evaluate this model ? Is there any evaluation metric ?

kevinjiang · August 21, 2021, 12:53pm

Regarding question 1. and 3., here is the doc for BCEWithLogitsLoss from PyTorch Docs,

This loss (BCEWithLogitsLoss) combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.

Thus, the reason why the final layer of the Discriminator is only made of Linear is that Sigmoid is “merged” in Loss to provide some optimizations.

Next time, you can turn to PyTorch Documentation to investigate the problem by yourself at first, which is obviously the purpose of providing detailed docs.

Regarding question 2, there is a tradeoff between ReLU and LeakyReLU. Of course, LeakyReLU avoids the dead ReLU problem, but ReLU computes faster and may have some other advantages, such as sort of regularization. Based on what I know, LeakyReLU and ReLU are all widely used in all fields.

And finally the question 4, there are definitely some metrics to evaluate GAN. Fréchet inception distance (FID) and Inception score (IS) are all famous options at your hand. You could google them to get more information. By the way, they will be taught in the following courses.

chinmay-d · August 21, 2021, 4:48pm

Answering question 2.

Dying ReLU is caused by :

High learning rate
large negative bias

If we use ReLU then we can avoid these problems by:

Using a smaller learning rate
Modifying the initialization procedure: Some initialization procedures may cause dying gradient problems due to bad local minima. For ex. He Initialization, in such case modifying the initialization of weights procedure, can be done. Randomized Asymmetric Initialization can be helpful to solve this.

Or you can simply use LeakyReLU. As long as you avoid the problem of dying ReLU, you can use ReLU or LeakyReLU.

Hope this answers your question.

Mansour · August 21, 2021, 8:03pm

Thank you for your answers.

dheerajreddy3108 · August 22, 2021, 2:52pm

Hello Mansour,
The answer for the ques 3 is using BCEWithLogitssLoss() helps to apply the activation function internally i.e BCE loss + sigmoid function but if we are using BCELoss() then user has to manually add the activation (sigmoid).

Mansour · August 23, 2021, 10:55am

Thank you very much Dear @dheerajreddy3108 , very clear now.

Mansour · August 23, 2021, 10:56am

Yes Dear @chinmay-d , It does. Thank you very much.

Yoni_Shtiebel · November 1, 2021, 6:40pm

The code crashes at the show_tensor_images line.
If I comment it out, it does not crash, but provides the wrong numbers for the losses.

Any idea how to debug this?

Thank you for your assistance and for this fantastic course!
### Visualization code ###
if cur_step % display_step == 0 and cur_step > 0:
print(f"Epoch {epoch}, step {cur_step}: Generator loss: {mean_generator_loss}, discriminator loss: {mean_discriminator_loss}")
fake_noise = get_noise(cur_batch_size, z_dim, device=‘cuda’)
fake = gen(fake_noise)
show_tensor_images(fake) ****
show_tensor_images(real)
mean_generator_loss = 0
mean_discriminator_loss = 0
cur_step += 1

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value=’’)))

HBox(children=(FloatProgress(value=0.0, max=469.0), HTML(value=’’)))
Epoch 1, step 500: Generator loss: 1.3957562952041633, discriminator loss: 0.41824104803800605
in show_tensor_images(image_tensor, num_images, size)
10
11 def show_tensor_images(image_tensor, num_images=25, size=(1, 28, 28)):
—> 12 image_unflat = image_tensor.detach().gpu().view(-1, *size)
13 image_grid = make_grid(image_unflat[:num_images], nrow=5)
14 plt.imshow(image_grid.permute(1, 2, 0).squeeze())

AttributeError: ‘Tensor’ object has no attribute ‘gpu’

cvetko.tim · November 18, 2021, 3:59pm

https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

Topic		Replies	Views
Problem with BCE loss video question Build Basic Generative Adversarial Networks week-module-3	18	571	February 27, 2022
Week 1 Assignment: RuntimeError Build Basic Generative Adversarial Networks week-module-1	7	875	February 15, 2022
Week3 : Programming assignment Build Basic Generative Adversarial Networks week-module-3	8	499	September 6, 2021
Choice of activation function Week 1 Build Basic Generative Adversarial Networks week-module-1	2	37	July 8, 2024
Question about BCE loss Build Basic Generative Adversarial Networks week-module-3	7	339	November 15, 2022

Questions regarding C1W1 assignment

Related topics