Is ReLU not destructive in backpropagation?

I am training a CNN on x-ray images for the purpose of disease detection. EfficientNet-B4 architecture is the one I’m using for this purpose with some custom layers at the top. Following are the final 3 layers in this network:

tf.keras.layers.Dense(896, activation='relu')
tf.keras.layers.BatchNormalization()
tf.keras.layers.Dense(1, activation='sigmoid')

I wonder if the relu activation in the Dense layer (before batchnormalization) is going to be of any use since it would straightaway zero out all the negative incoming activations, thus during backprop all those neurons (in the preceding layer) would be rendered useless which gave negative activation as output. Can anyone throw some light here?

1 Like

Apparently yes the gradient for values smaller than 0 is going to be 0, so it will hinder learning from negative weights I guess!

2 Likes

I haven’t invest much time into learning about CNNs yet, although I just enrolled the DL course regarding CNNs. So, I apologize if it’s a question which obviously lacks of knowledge. But may I ask, if you tried to use a leaky ReLU or even an ELU? Those might be helpful regarding your negative values, although the computation-cost rises.

1 Like

If a ReLU unit has a negative weight or bias, it can still give useful outputs to feed to the next layer.

2 Likes

haven’t tried leaky relu or elu yet. In the context of this question, they would definitely be better than relu

1 Like

May I ask you to explain me how the outputs may be still useful? If I understand it correctly, once a ReLU reaches a <= 0 it might lead to dying nodes, if the bias can’t feather it up. It’s not an immediate death of the ReLu-unit but those nodes may accumulate over time/images, depending on certain image properties. Or is that incorrect?
Would be awesome, to learn from you, since im still fairly new to DL

The ReLU activation is applied after the (w*x + b) calculation.

So for example, if w is negative and b is positive, then for negative values of x, the output is still positive (not clamped to zero).