I am training a CNN on x-ray images for the purpose of disease detection. EfficientNet-B4 architecture is the one I’m using for this purpose with some custom layers at the top. Following are the final 3 layers in this network:
I wonder if the relu activation in the Dense layer (before batchnormalization) is going to be of any use since it would straightaway zero out all the negative incoming activations, thus during backprop all those neurons (in the preceding layer) would be rendered useless which gave negative activation as output. Can anyone throw some light here?
I haven’t invest much time into learning about CNNs yet, although I just enrolled the DL course regarding CNNs. So, I apologize if it’s a question which obviously lacks of knowledge. But may I ask, if you tried to use a leaky ReLU or even an ELU? Those might be helpful regarding your negative values, although the computation-cost rises.
May I ask you to explain me how the outputs may be still useful? If I understand it correctly, once a ReLU reaches a <= 0 it might lead to dying nodes, if the bias can’t feather it up. It’s not an immediate death of the ReLu-unit but those nodes may accumulate over time/images, depending on certain image properties. Or is that incorrect?
Would be awesome, to learn from you, since im still fairly new to DL