Initial values for gradient descent algorithm

Week 1 and Week 2 Gradient descent algorithm

I have below 2 questions:

  1. How to assume the initial values for the parameters w and b? Should we always start with 0, 0?

  2. How to decide the no. of iterations for which the gradient descent will run?

  1. For simple regressions, starting with zeros is a good choice. We have no way to pick any better initial values.

  2. Experimentation. Find the number of iterations which give good-enough convergence.