How are Cost Function graphs plotted before my gradient descent algorithm is implemented?

It’s confusing to me that when I run gradient descent here, the code takes some time to execute and shows the final results. Now whats happening is that there is a Cost Function for linear regression model and when I apply that function at some values of w and b (initial values), I get a point somewhere on the graph. Now, since there could be unlimited values of w and b, I use gradient descent to update the values of w and b, so that I move near towards the minima. The gradient descent formula is computed multiple times, let’s assume 10 times before reaching the minima.

When 10 iterations are completed I get the final results of w and b that I want, but what I also get is a graph of J(w,b), with respect to w and b. Let’s assume w, we get this graph as shown below.

image

Now the confusion that I have is that when I perform this in a lab I get everything but I also get a graph of J(w,b) vs w. And this is the graph that I get:

Now here you can see that I’ve provided the function an argument “hist”. In this variable there are 10 values of Cost stored in “hist[“Cost”]” that were stored while running gradient descent algorithm.

With these 10 values of Cost I should get a graph that ends at the last value of J(w,b) i.e. “2087.34” since that’s all that I’ve got. But I get a graph that starts somewhere near 60,000, goes down to somewhere below “2087.34” and then goes back up to somewhere near “60,000”.

If I haven’t provided the values of J(w,b) to the function, how were they computed and plotted into the graph?

This is important for me to understand because this might mean that there is some very efficient code running in the back-end of this function which computes the values of J(w,b) faster than the code written in this lab. If that’s the case then I should learn that, maybe that will come in handy.

Or there is something that I’m missing. Because what I’ve understood is that we do not want to get all values of J(w,b) because there could be unlimited values of w and b, so to be efficient we need to run gradient descent to get the slope and notice if the value of slope approaches near 0. This way we cannot get the graph of J(w,b) completely when we run gradient descent. This is how I think this formula works in getting near the minima and we do not have the actual complete graph of J(w,b) before running gradient descent. Instead we have a graph that starts from the first value of Cost that we get and the last value of Cost that we get in the entire running of Gradient Descent Algorithm.

But if that’s not how it works then there is something wrong in the way that I see this entire algorithm running.

Which part of the assignment are you running? Please give the title of the notebook.

1 Like

It’s C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln

Course 1, Week 2, Lab 3

Gradient descent only finds the minimum cost. That’s also where the cost history plot comes from vs. the number of iterations.

For the plot vs. the w[0] value, what’s undoubtedly happening is that there’s some additional code in the notebook that is computing the cost over a range of w values, just for the purpose of creating that plot.

I’ll point out where this happens once I learn from you which notebook you’re using.

Yes, that’s what I thought too, but I couldn’t find that extra code. I’m still finding, if you can help that’ll be great.

It’s C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln

Course 1, Week 2, Lab 3

That explains why I couldn’t find it. You’ve posted your message in Week 3.

Oh sorry, that’s my bad…

I have just changed it back to week 2.

1 Like

@Ammar_Jawed,

In the notebook, you can click “File” > “Open”, and then look for the script file that defines plot_cost_i_w.

Raymond

The magic happens in the first four lines of plot_cost_i_w(). The cost is computed over an array of weight values.

Thankyou for the help. It makes sense now…