Proff told weighs are very small, graident could be smaller so it takes lot of tiny little steps to descent downwards.
But proff didnt told the other case, what happens if the gradient exploids ( when weights are large ) …how does it impact the training to be difficult ?
In the case of exploding gradients, the accumulation of large derivatives results in the model being very unstable and incapable of effective learning, The large changes in the models weights creates a very unstable network, which at extreme values the weights become so large that is causes overflow resulting in NaN weight values of which can no longer be updated.