anyone can explain to me why dW have small value and db big value ?

Please give us a bit more context for your question here. Which lecture and the time offset into the lecture that you are asking about?

One thing to note is that we apply Gradient Descent individually to each dW and db value: there is no case in which the gradients of those two different quantities interact. If you are looking at the shapes of the ellipses in the graphs Prof Ng shows, not that the axes of the ellipse are different elements of W or b. Also note that these graphs are very unrealistic since they are in 3 dimensions: the actual solution spaces we are dealing with here have (typically) hundreds or thousands of dimensions.

Sorry, it’s been several years since I watched those lectures. I will need to watch them again in order to contribute to the discussion. I see from the diagram that this case is different than the ones I remember where the axes were different elements of w.

The rest of my day today is pretty busy, so it will likely be more than 12 hours before I can get to this. In the meantime, you might profit from just watching the lecture again from the beginning. I’ve got to believe that Prof Ng would have explained the point you are asking about.

Actually I think you can see what he means from the diagram: note that the ellipses are elongated on the W axis and squashed on the b axis. Those are “contour lines” of equal cost on the cost surface. If you think about the geometric meaning of the shapes of those ellipses, it means that the surface is much steeper in the b direction than it is in the W direction. Think of taking a vertical slice parallel to the b axis and what it means that the contours are closer together in that direction. Think of a topographical map as a good real world analog: when the contour lines are close together, that means the gradient is steep in that area in the direction perpendicular to the contour lines.