In week 1, Andrew explains the GD algorithm and we implement it in the lab. I’m having trouble consolidating Andrew’s theoretical explanation, and the implementation of the algorithm in practice:

Andrew’s description of GD is that the algorithm starts in an arbitrary location, and then proceeds towards the location that minimises the cost.

The way the algorithm works however, is that it runs over all `m` training examples and updates the parameters `w` and `b`.

My question is:
In Andrew’s description, the advancement towards the minimum, is on a path. It doesn’t have to be a straight path, but it is a path. The algorithm however, runs all over the space, from one training example to another. The training examples are not sorted, and so the algorithm computes and updates parameters in one location, then jumps to another training example which is not necessarily alongside the previous one. How can the theoretical explanation, and the actual implementation be reconciled?

1 Like

Hi,
It requires the math to understand the theory. That somehow professor Ng avoid to make things simpler. As he often says ‘dont worry about the math’. Here is the [link](Gradient descent (article) | Khan Academy you can look at to understand about the algorithm.

In addition, you get a chance to implemet that algorithm from scratch in the programming exercises.

What you sent in the link is the same explanation Andrew gives. It does not explain why going over all unsorted training examples will produce an advancement in one direction.