About Dropout in CNN

There are a few Dropout layers applied in U-net, still confused how dropout is applied in CNN, in FC layer it actually mute some node’s output by chance, but how to apply the concept to CNN after conv2d layer?

Dropout works the same way after a Conv layer: it randomly zeros some of the activation outputs to weaken the connections. It’s just that the shape of the input and output is a 3D tensor instead of a vector, but the effect is the same. You can think of each position in the conv output as a “neuron” just in the same way as with a Fully Connected layer.

Adding to @paulinpaloalto clear answer, I’d like to share how I apply dropout in a CNN.

A typical ‘module’ of a CNN that I use has the following components:

  1. The Convolutional layer itself (like a Conv2D layer)
  2. Followed by an activation (usually ReLU)
  3. Followed by a Pooling (usually MaxPooling).

This ‘module’ structure is repeated 1 or more times before getting to the head of the model.

Then the ‘head’ of the model has a usually a Flatten layer, followed by one or more Dense layers until the last layer that can be what you need, depending on the task of the model.

Given the above general structure, I would insert Dropout layers in zero or more of the following:

  1. In between ‘modules’. For example ‘module_1’ - Dropout - ‘module_2’ - Dropout…
    May be I would experiment carefully when adding the dropout before the Flatten Layer.

  2. Between the Dense layers.

There’s no defined formula on when, where, how to add the Dropout. It will depend very much on your task, so experimenting with several combinations is a due task in your implementation.

I hope this adds a bit more light to your question :slight_smile:

Juan

Thanks, Juan, for all the additional great explanations! The only thing I would add is that looking at the various models Prof Ng shows us here in Course 4, it’s common that there can be several conv layers back to back before a pooling layer. Meaning that there is more than one way to reduce the height and width dimensions: you can do it with a conv layer with “valid” instead of “same” padding or with non-zero stride. So it is not necessary to have a pooling layer after each conv layer. All these network architecture questions are decisions that need to be made by the designer in each case. If you are lucky, you can find an existing model that worked well in a problem that is at least somewhat similar to the new problem you are trying to address. Start with that architecture and make further adjustments to tailor it to your problem.

Thanks @paulinpaloalto ! It is totally like you say! I didn’t mean to lead to a single structure of CNNS modules - but now that I read it it looks like that, so my apologies :slight_smile:

For some reason every time I create a CNN I do it by building these ‘modules’, but as you explain, this is a choice I made. But definitively, the structure of the CNN can be anything the designer needs/decides.

Thank you for clarifying this!

Juan

Thanks for the clarification. If I understand it correctly, for each input value(activation of previous layer) there is a chance to be set with 0, and it will kept as 0 when do the conv computation with all the follow channels filter (rather than do dropout to the input before filter conv for each channel). Let me know if there is a mistake.

Yes, the dropout is applied to the output of a given layer. Of course in any neural network (fully connected, convnet, …), the output of one layer is the input to the next layer.

1 Like