Hi @rmwkwok , I think that is just a hypothesis, I haven’t verify it and it may not be correct. And increase the learning rate may cause over shooting and never converge, but from what I can see from the formula the w would decrease or increase more each step by increasing learning rate. So here I am trying to understand why the extra part added to the cost function would help keep w small, that might be an explanation from mathematic, but very much appreciated it if you could help me understand it in anyway.