Hello, I’ve got a couple of questions from week 1 lectures and would appreciate if you could help me with them.

- I know from previous courses in statistics that the problem of linear regression can be solved using other methods such as maximum likelihood estimation for example. Here we learned the gradient descent algorithm to find the best fit. Now I am wondering how I should approach problems in general? How should I know which method is the most suitable for my data?
- I don’t quite understand what is meant by “convergence” in the context of gradient descent algorithm? How is “convergence” formulated mathematically? In the final lab we used 10000 iterations, to make sure model parameters converge. I was thinking of using a while loop to repeat the algorithm until “w” and “b” converge, but I am not sure what the condition should be.
- In one of the lectures it was mentioned that one of the issues with the gradient descent algorithm is that we might end up at a local minimum instead of the global minimum depending on our initializations (assuming cost functions other than the squared error). But I did not understand what the solution to this issue is.

Thank you very much for your time!