Monotonic decrease of cost function plot confusion

Jeffrey_Antony · May 12, 2024, 7:11am

I am confused with the explanation given for plotting cost function which shows monotonically decrease of cost.

In week 2 (course 2 of specialization), Prof. says

In video “Why Regularization Reduces Overfitting?” at time 6:35, that if we plot the cost function without L2 norm, we cannot see the plot is decreasing monotonically.

In video " Understanding Dropout" at time 6:30, to turn of dropout (i.e keep_prob =1) and check plot is decreasing monotonically.

Why in 2nd video he says to switch off drop out and you should see if the cost plot decreases monotonically ?
Actually drop out and L2 norm are kind of regularization. So if we deactivate regularization (i.e not add the L2 norm or keep_prob =1), we are with the original cost function. And with that, as mentioned in the Video 1 we shouldn’t see monotonically decrease of cost. But in video 2 he says we should see a monotonic decrease.

Is this a mistake in the video explanation ? Can someone explain why the difference ?

Alireza_Saei · May 12, 2024, 9:23am

L2 and DropOut are both forms of regularizations but they have different effects on training process. L2 regularization penalizes large weights in the model, while dropout prevents co-adaptation of neurons by randomly disabling some of neurons during training process.

Jeffrey_Antony · May 12, 2024, 9:42am

Can you please elaborate w.r.t my question? Actually I am looking for the case where regularisation is not applied.

What I don’t understand is when not applying the regularisation why it’s said in second video that there will be a monotonic decrease?

Alireza_Saei · May 12, 2024, 1:25pm

In the video Understanding Dropout, it is explained that dropout can make the cost function plotting unstable because disabling some neurons forces the remaining neurons to compensate and learn the roles of the disabled ones. Therefore, it’s recommended to first check if the cost function behaves reasonably without dropout and then apply dropout to improve the model’s generalization performance.

Hope this helps!

paulinpaloalto · May 12, 2024, 9:14pm

You must just be misinterpreting what he’s saying as a more general statement than he’s actually intending. As a general matter, there is never any guarantee that gradient descent will produce monotonically decreasing costs. The cost surfaces here are in very high dimensions and are incredibly complex. Gradient Descent can do a “Wile E. Coyote” at any point and just step off a cliff. Here’s a paper about visualizing cost surfaces.

Topic		Replies	Views
The behavior of loss function figure when regularization is used Improving Deep Neural Networks: Hyperparameter tun	2	518	June 18, 2022
Plotting Cost Function J with Dropout Improving Deep Neural Networks: Hyperparameter tun	3	590	January 4, 2022
Difference between cost function of L2 and dropout regulariztion - Week1 Improving Deep Neural Networks: Hyperparameter tun	3	541	December 19, 2022
Mini-batch gradient descent decreasing Improving Deep Neural Networks: Hyperparameter tun	7	697	September 3, 2021
Cost function is not well defined using dropout Improving Deep Neural Networks: Hyperparameter tun	3	536	November 17, 2022

Monotonic decrease of cost function plot confusion

Related topics