Hi. Why is it that for the Programming Assignment for Regularization we implemented L2 reg in computing the cost and backward prop while for Dropout we implemented it in Forward and Backward Prop?
Since, regularization techniques are implemented to prevent overfitting of the algorithm, shouldn’t we implement regularization on the forward prop, computing cost, backward prop and updating parameters code? Since these are all the key steps for the NN to ‘learn’ training set.
The way L2 regularization and dropout work is different. L2 just applies an extra set of terms in the cost, so it does not affect what happens during forward propagation through the hidden layers. It only happens after the output layer. But it’s part of the cost J and all the gradients are partial derivatives of J w.r.t. the various parameters at each layer, right? So the L2 terms do affect what happens in back propagation. Otherwise, what would be the point, right? If it doesn’t change the parameters that we get, why do it?
In the case of Dropout, it does directly affect what happens in forward prop at the hidden layers. Then it also (of course) affects back prop.