If the goal of gradient descent is to find global minimum for cost, why is that we are picking up the value at the end of iterations? What if one of the earlier iteration has a minimum value?
Below is what is found as the last value of b and w at cost 6.7560
alpha = 0.01 b=100.011567727362 w=199.99285075131766
cost 6.745014662580395e-06
Where is one of the earlier iterations, I found the below values
alpha = 0.01 b=100.08531003189917 w=199.947275500705
cost 0.00036684872861835616
I know the b and w values are almost the same, but theoritcally, they are not exactly minimum. What am I missing here?