I am confused with the explanation given for plotting cost function which shows monotonically decrease of cost.
In week 2 (course 2 of specialization), Prof. says
- In video “Why Regularization Reduces Overfitting?” at time 6:35, that if we plot the cost function without L2 norm, we cannot see the plot is decreasing monotonically.
- In video " Understanding Dropout" at time 6:30, to turn of dropout (i.e keep_prob =1) and check plot is decreasing monotonically.
Why in 2nd video he says to switch off drop out and you should see if the cost plot decreases monotonically ?
Actually drop out and L2 norm are kind of regularization. So if we deactivate regularization (i.e not add the L2 norm or keep_prob =1), we are with the original cost function. And with that, as mentioned in the Video 1 we shouldn’t see monotonically decrease of cost. But in video 2 he says we should see a monotonic decrease.
Is this a mistake in the video explanation ? Can someone explain why the difference ?
L2 and DropOut are both forms of regularizations but they have different effects on training process. L2 regularization penalizes large weights in the model, while dropout prevents co-adaptation of neurons by randomly disabling some of neurons during training process.
Can you please elaborate w.r.t my question? Actually I am looking for the case where regularisation is not applied.
What I don’t understand is when not applying the regularisation why it’s said in second video that there will be a monotonic decrease?
In the video Understanding Dropout
, it is explained that dropout can make the cost function plotting unstable because disabling some neurons forces the remaining neurons to compensate and learn the roles of the disabled ones. Therefore, it’s recommended to first check if the cost function behaves reasonably without dropout and then apply dropout to improve the model’s generalization performance.
Hope this helps!
You must just be misinterpreting what he’s saying as a more general statement than he’s actually intending. As a general matter, there is never any guarantee that gradient descent will produce monotonically decreasing costs. The cost surfaces here are in very high dimensions and are incredibly complex. Gradient Descent can do a “Wile E. Coyote” at any point and just step off a cliff. Here’s a paper about visualizing cost surfaces.
2 Likes