Optional Lab: Gradient Descent1

Hello, I just finished week 1 of the supervised ML: Regression and Classification. I have two questions, one about the lab and one talking about gradient descent.

One about the lab:
If you take a look at this image right below, this is how they computed the cost function.
If you take a look at image right below this text, in the top right, it looks like the calculation is different between the two of them.

why you might ask? Because with the order of PEMDAS, I thought you had to subtract x[i] and y[i] first then multiply that subtracted number from the function F(w,b) next? But what the lab does is, it uses the function to calculate x[i] then subtracts it from y[i]. I could be wrong with my thinking but I am curious what other people have to say about why I am wrong or if I am right.

Question about gradient descent:
Is gradient descent only used for regression algorithms like linear regression instead of classification algorithms?


f_{w,b}(x^{(i)}) doesn’t mean that you have to multiply f_{w,b} with (x^{(i)}). It means that the value of \hat{y} at given values of w,b and x. Note that \hat{y}^{(i)} and f_{w,b}(x^{(i)}) are the same. So, the equation can be rewritten as \hat{y}^{(i)} - y^{(i)}.

Regarding gradient descent, yes we use it for regression as well as classification problems.


1 Like

Hi @Nathan_Angell ,

If you look carefully of the formula for the linear regression model, you can see that the model is built on a linear transformation, outputtng a value,\hat{y}^{(i)}, for a single example. So when calculating the cost J with respect to w,b, it would have to be the transformed value, ,\hat{y}^{(i)} to be used and not x[i] for each example. Bearing in mind that the loss, which is the difference between the \hat{y}^{(i)} and y^{(i)} is telling us how far apart the predicted value \hat{y} is from the true value.

Gradient descent is an optimizer for finding the values of parameters W and b of a function where the cost is at the minimum. So it that fits the bill, then, there is no reason why gradient descent is restricted to just regression algorithms.

Thank you for that information!

Thank you so much, that totally makes sense!