Gradient Descent confusion

Mahmoud_Mohamed4 · March 25, 2023, 11:00am

Hi everyone, I’ve just started recently to learn about ML, since we have an ML class at college, I’m studying computer science.

so I understand the GD algorithm and concept of the cost function (MSE) pretty well, I guess, but I’ve come across two different definitions of each equation respectively, which had me kind of confused about the meaning of that

so there’s this one here which was the first one I’ve come across and understand

but now there’s this one here as well

and since each one of them actually results in a different derivation
I can’t quite grasp the difference between both of them

so can any one please help me clarify this.

Kic · March 25, 2023, 11:40am

Hi @Mahmoud_Mohamed4 ,

As the second diagram has no reference to the terms used in the formula, and I cannot find it in the lecture video, so I assume this is what it means:

E represents error, so here, the formula is to calculate the mean error value, where n is the number of examples.
y_i is the true label
(mx_i +c) is the prediction where m is the same as h_θ. But with this formula, the bias term is expressed in C. Whilst in the first formula, the bias term could be the first element of the h_θ vector if used. The footnote is telling us this term (mx_i +c) is the ŷ_i. The square of the resulting term take care of any negative value.

So basically, the two formulae are the doing the same thing, calculating the cost, ie, the error.

Mahmoud_Mohamed4 · March 25, 2023, 11:48am

Yeah, I get you
But why is the 1st equation multiplied by 1/2m, while the 2nd is multiplied by 1/n?
does it have any effect on the results at all?

Kic · March 25, 2023, 12:07pm

Hi @Mahmoud_Mohamed4 ,

If you go back to the lecture video - Cost function formula, Prof, did mention about that, and he said he would explain it later on. If you think of cost as an indication of how well the model is finding the values for parameters W and b, weights and bias, then it is the downward trend that matters more than just the pure value.

Mahmoud_Mohamed4 · March 25, 2023, 12:09pm

in which video was that part mentioned please

Kic · March 25, 2023, 12:13pm

At timestamp 7:07

Cost function formula | Coursera

Juan_Olano · March 25, 2023, 12:38pm

Adding to @Kic clear answer, I would include that using 1/n or 1/2n is valid and a choice you can make. This is a term that will basically scale the cost to the number of samples.

The important thing is to be consistent across the entire model. If you pick 1/n or 1/2n, make sure you are consistent with your choice.

Mahmoud_Mohamed4 · March 25, 2023, 12:50pm

@Juan_Olano and @Kic
Thank you guys for the help

paulinpaloalto · March 25, 2023, 5:17pm

As Kin and Juan have explained, which constant factor you choose doesn’t really matter in terms of the final solution you get: if you minimize E, you have also minimized \frac{1}{2}E and the other way around. The one other thing it might be worth mentioning is why a lot of people prefer to add the factor of \frac{1}{2} here: the very next step will be to take the derivative of E in order to compute the gradients for back propagation. Notice that the error terms are squared there, so taking the derivative will give you a factor of 2, right? If we have:

f(x) = x^2

Then

f'(x) = 2x

So the \frac {1}{2} will cancel that and just make the formulas for the gradients a bit simpler and cleaner. As mentioned above, it gives the same final answer either way, so why not optimize for the simpler gradient formulas? Those are what we actually use in terms of writing the code.

Mahmoud_Mohamed4 · March 26, 2023, 5:16pm

Thank you for the clear explanation

Topic		Replies	Views
Cost_function formula_Difference between 2*m & m? Supervised ML: Regression and Classification week-module-1	5	540	February 2, 2023
I didn't undertand why divide by 2m is better Supervised ML: Regression and Classification week-module-1	9	1370	July 19, 2022
Cost function formula (why 2m in denominator) Supervised ML: Regression and Classification week-module-1	9	157	September 19, 2024
Cost Function formula with 1 training example Advanced Learning Algorithms week-module-2	1	530	January 26, 2023
Inconsistency in Logistic Regression Cost Function Supervised ML: Regression and Classification week-module-3	4	572	July 4, 2022

Gradient Descent confusion

Related topics