Just wondering for gradient descent for multiple logistic regression as in the talk by Prof Ng for working out w1 ie (j=1) we start with a random value of w1 eg w=1 then use gradient descent to converge. However when we are converging for w1=1 to start, what value do we use for the other w values in oder to calcuate the fw,b(xi). Do we start with w=1 for all the w values and then update for j=1…n simultaneously?
We can start with any values for the weights… even same values are fine. This is not so much an issue with logistic regression.
When we get into neural networks, there comes the need for symmetry breaking and hence we initialize the weights of different neurons with different random values. You will see this in course 2.
But for now, you can go ahead and initialize in any way you choose.
Yes, Prof. Andrew recommends to do a simultaneous update of all the weights for each iteration.