Week 1 increasing number of iterations for big randomly initialize value of W does not give better results

In the second case of the lab where we randomly initialize big values for W, I understand that the cost is high because the values for activation functions in each layer are near 0 or 1.

But when I try running the model for more iterations (up to 100000), why the cost does not continue to decrease and we get beautiful values of W as in the third case when we use he initialization?


Maybe the model gets stuck at a local optima!

does it mean the result i get is specific for this case only and in other cases where i use big randomly initialized w, the cost can be as low compared to He or Xavier initialization?

Im not sure but maybe initializing with those methods is probably better.