Understanding Dropout

Please give us a reference to where that quote comes from. If it is one of Prof Ng’s lectures, please give the name of the lecture and the time offset. If it’s from the web, please give us a link.

The point is that this is an effect of how dropout works: because different random neurons are “zapped” on each iteration, it modifies the way the training happens and causes the learned connections between particular outputs of the previous layer and neurons in the given layer to be weaker. And they are suggesting that weaker connections are reflected in having smaller weight values.

Well, what is the squared norm of the weights? It is the sum of the squares of all the elements of W at a given layer of the network. So if the weight values are less (see previous paragraph), it will make the squared norm less as well, right? If you square a smaller number, the result is smaller. We’ve been through this business of considering the meaning of the squared norms before, right? Remember this thread about how inverted dropout works? That was from almost exactly two years ago. A proverbial “Blast from the Past!” :nerd_face: