C1 W1 Optional Lab: Gradient Descent

In the gradient_descent function, we print the history of the descent. However, I am struggling to see why the numbers printed are correct.

In any given iteration, dj_dw and dj_db are computed with w, b before being updated. J_history is computed after w, b are updated. Yet they are printed together.

I do not understand why the gradient before taking a step should be associated with the cost after taking a step, especially considering p_history is also using w, b values after taking a step. Did I misunderstand the code? Or something else.

They don’t have to be directly associated, the purpose of this is just to show that the cost is decreasing as the weights are changing.

Typically, we’d only print the cost history, and not the weights - because if there are more than a trivial number of weights, it would be number soup on the display.

