Hello, @qihua-william,
Because w0 is not the iteration’s weight before-updated, instead, it is after-updated. We need to verify them this way:
Cheers,
Raymond
PS: This reason could be found by inspecting the run_gradient_descent()
's source code which is available on Cousera. Let us know if you need the steps to open the source code.