Choice of activation function Week 1

In the first week assignment “Your First GAN”, the generator block uses ReLU activation, while the discriminator block uses Leaky ReLU. It makes sense to have Leaky ReLU for discriminator to avoid having 0 slope and thus slowing down the optimization.

Is it safe to say that the choice of ReLU activation for generator is correct because it creates images which are less likely to map to zero? So no need of Leaky ReLU activation.

Thanks in advance,
Sailesh

Just on general principles, 0 gradients are always a bad thing, you’d think. But it just turns out that there are lots of situations in which ReLU works fine as an activation function. It’s the cheapest, so you try that first. If it fails, then you try Leaky ReLU. Almost as cheap, but avoids 0 gradients. If even that fails, only then do you graduate to more expensive activations like tanh, sigmoid, swish and so forth.

So I think the correct lesson to draw here is that they tried the experiment and found that ReLU works for the generator, but does not for the discriminator. Maybe there is some more detailed intuition we could come up with for why that might be if we thought hard enough about it. :nerd_face:

That could very well be from trial and error. But not sure if we are missing alternative reasoning.