Hi there. While doing this lab, I set about validating the cost for the function f_w_b(x) = w*x^2 + b at w=1 and b=0.049 in Excel. The cost I calculated(0.4522) is quite far from what I’m seeing in this code from the lab(0.208962 at iteration 9000):
X = X.reshape(-1, 1) #X should be a 2-D Matrix
model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)
It’s the prediction error. The third column minus the second column. The 5th column is the squared error, the square of the fourth column. Thanks for taking a look!
The result of the gradient descent with 10,000 iterations is b = .049. I agree that b = 0.49 results in a much lower cost, but I’m just trying to validate the results for those statements(ie that the cost at b = 0.049 is less than the cost at iteration 9000, 0.208962). Since my cost is higher, I’m assuming that either there is something wrong with the code or there is something wrong with my method(which is probably the case, but I want to know why this is not correct).
Are you trying to fit a straight line to a parabola (f = x^2 + 1)?
This would be a little simpler to verify if you just list the two columns: the ‘x’ value and the corresponding ‘y’ value. The table you gave doesn’t define ‘y’.
If you have ‘x’, then you already know ‘x^2’. Then you’re just doing a linear regression on a linear function (for x^2 and y), and the results are b = 1 and w = 1.
This is because I think you defined ‘y = x^2 + 1’.
I think your learning rate is way too small to get a good solution using gradient descent.
It would be useful to plot the cost history during gradient descent - this will let you see if the cost has stabilized at a minimum.
I’m sorry for the confusion. y is the actual(second column).
I’m not actually doing any regression or gradient descent in the excel sheet, just checking the results at a particular point in the lab that yields b = .049 after 10,000 iterations of GD trying to fit w*x^2 + b to x^2 +1. So I’m just finding the cost function for f = x^2 + 0.049 as a fit for y = x^2 + 1.
Yes, I get the same, but the code outputs a different, smaller cost (as mentioned in my OP, 0.208962) at iteration 9000. At iteration 10,000, which yields b = .049, I believe the cost should be even lower than 0.20896, not higher(at 0.4522). That is my issue here.
print model_w, model_w[0], model_b out. You will see something different.
re-compute the cost value at the jupyter notebook using the model_w and model_b returned
update your excel sheet with the latest w and b value, and compare its result with step 2.
Note also that although it prints the cost value at the 9000th iteration, it did not print the w and b at the 9000th iteration. It only prints the w and b at the last iteration (which is the 9999th iteration). Therefore, it is wrong to compare the cost at 9000th with the w and b at 9999th. If you want to examine how the printing works, go to lab_utils_multi.py, and in fact, you might also modify the print to include also the w and b at the 9000th iteration.
Thank you Raymond. I haven’t had a chance to look at it yet, but regarding your comment at the end that it is wrong to compare cost at 9000th iteration to cost at 9999th, shouldn’t the cost be monotonically decreasing, and therefore I should expect cost_9999 < cost_9000?
That’s a reasonable argument, but we need to do everything correctly including all the three steps I have suggested, otherwise, we will discuss based on wrong observations. Let’s hold on the discussion, make sure everything is right, then we discuss.
As a matter of fact, if we think reversely, your reasonable argument suggested that something was wrong in the observations, didn’t it? Great argument!!
TMosh - my apologies, I just noticed that I wrote “0.49” instead of “0.049” in my initial post. I have since corrected it, though I am still perplexed about the discrepancy between the lab and my result.
Although these parameters are very close to what I had in my excel sheet, I updated them, and lo and behold, the resulting cost was, consistent with convergence, lower than cost_9000! It’s a shame that the code displays figures that are so rounded that they result in significant cost differences, but I suppose this will ultimately make me a better python coder.
In fact, I was also confused at first, and so I had to do some investigations too. I only realized the rounding thing after printing out w * x^2 and found that the result wasn’t consistent with w = 1.
I really do think that your earlier argument was great. We need some leads to investigate, mine was merely that the Excel result should be the same as the Python result. Yours actually showed a deeper understanding.
With such a good sense, I am sure, as you said, you will be a better coder, and it is also essential to be a good coder to do data science on computers.
Cheers,
Raymond
PS: In case you are interested, with my lead, what I intended to do was to repeat your excel’s step-by-step in Python. That’s why I did w*x^2. I repeated that because I wanted to know at which step their results started to show difference.