How are Cost Function graphs plotted before my gradient descent algorithm is implemented?

Ammar_Jawed · May 17, 2023, 12:04am

It’s confusing to me that when I run gradient descent here, the code takes some time to execute and shows the final results. Now whats happening is that there is a Cost Function for linear regression model and when I apply that function at some values of w and b (initial values), I get a point somewhere on the graph. Now, since there could be unlimited values of w and b, I use gradient descent to update the values of w and b, so that I move near towards the minima. The gradient descent formula is computed multiple times, let’s assume 10 times before reaching the minima.

When 10 iterations are completed I get the final results of w and b that I want, but what I also get is a graph of J(w,b), with respect to w and b. Let’s assume w, we get this graph as shown below.

Now the confusion that I have is that when I perform this in a lab I get everything but I also get a graph of J(w,b) vs w. And this is the graph that I get:

Now here you can see that I’ve provided the function an argument “hist”. In this variable there are 10 values of Cost stored in “hist[“Cost”]” that were stored while running gradient descent algorithm.

With these 10 values of Cost I should get a graph that ends at the last value of J(w,b) i.e. “2087.34” since that’s all that I’ve got. But I get a graph that starts somewhere near 60,000, goes down to somewhere below “2087.34” and then goes back up to somewhere near “60,000”.

If I haven’t provided the values of J(w,b) to the function, how were they computed and plotted into the graph?

This is important for me to understand because this might mean that there is some very efficient code running in the back-end of this function which computes the values of J(w,b) faster than the code written in this lab. If that’s the case then I should learn that, maybe that will come in handy.

Or there is something that I’m missing. Because what I’ve understood is that we do not want to get all values of J(w,b) because there could be unlimited values of w and b, so to be efficient we need to run gradient descent to get the slope and notice if the value of slope approaches near 0. This way we cannot get the graph of J(w,b) completely when we run gradient descent. This is how I think this formula works in getting near the minima and we do not have the actual complete graph of J(w,b) before running gradient descent. Instead we have a graph that starts from the first value of Cost that we get and the last value of Cost that we get in the entire running of Gradient Descent Algorithm.

But if that’s not how it works then there is something wrong in the way that I see this entire algorithm running.

TMosh · May 17, 2023, 12:20am

Which part of the assignment are you running? Please give the title of the notebook.

Ammar_Jawed · May 17, 2023, 12:23am

It’s C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln

Course 1, Week 2, Lab 3

TMosh · May 17, 2023, 12:23am

Gradient descent only finds the minimum cost. That’s also where the cost history plot comes from vs. the number of iterations.

For the plot vs. the w[0] value, what’s undoubtedly happening is that there’s some additional code in the notebook that is computing the cost over a range of w values, just for the purpose of creating that plot.

I’ll point out where this happens once I learn from you which notebook you’re using.

Ammar_Jawed · May 17, 2023, 12:25am

Yes, that’s what I thought too, but I couldn’t find that extra code. I’m still finding, if you can help that’ll be great.

It’s C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln

Course 1, Week 2, Lab 3

TMosh · May 17, 2023, 12:25am

That explains why I couldn’t find it. You’ve posted your message in Week 3.

Ammar_Jawed · May 17, 2023, 12:26am

Oh sorry, that’s my bad…

rmwkwok · May 17, 2023, 12:26am

I have just changed it back to week 2.

rmwkwok · May 17, 2023, 1:57am

@Ammar_Jawed,

In the notebook, you can click “File” > “Open”, and then look for the script file that defines plot_cost_i_w.

Raymond

TMosh · May 17, 2023, 4:23am

The magic happens in the first four lines of plot_cost_i_w(). The cost is computed over an array of weight values.

Ammar_Jawed · May 17, 2023, 1:34pm

Thankyou for the help. It makes sense now…

Topic		Replies	Views
Supervised Machine Learning Optional lab: Gradient descent Question Supervised ML: Regression and Classification week-1	6	554	July 11, 2023
Cost Function and Gradient Descent Supervised ML: Regression and Classification week-1	3	323	October 27, 2023
C1_W1_Lab04_Gradient_Descent_Soln [name error] Supervised ML: Regression and Classification week-1	3	377	October 5, 2023
Optional Lab: Gradient Descent1 Supervised ML: Regression and Classification week-1	4	513	April 28, 2023
C1 W1 Optional Lab: Gradient Descent Supervised ML: Regression and Classification week-1	2	316	November 8, 2023

How are Cost Function graphs plotted before my gradient descent algorithm is implemented?

Related topics