Gradient Descent [Logistic Regression]

I want to understand what happens when we are training m number of training examples, does the forward propagation followed by backward propagation happen only once, or does this cycle keep happening until we get the desired w and b for that particular training example and then move forward to second training example to do the same?

Also, does one epoch mean training m number of training examples once?

Forward propagation followed by backward propagation happens the number of iteration times for all examples (not one by one).


Can you explain in detail?

This was all explained in the lectures, but here’s my summary:

Here in Course 1, we do “full batch” gradient descent. That means we do a number of iterations of the following process:

  1. Compute forward propagation on all training samples with the current weights. This is done in a vectorized way for efficiency.
  2. Do backward propagation on all samples to compute the gradients, which are averaged over all the samples.
  3. Apply the computed gradients to update the weights.
  4. Go to 1) again and repeat for the full number of iterations.

Steps 1) to 3) are called one “epoch” of training. Later in Course 2 we will learn a more sophisticated technique called “minibatch gradient descent” where we break up the full m training samples into “minibatches” and iterate through those in each “epoch”.

If this still doesn’t make sense to you, my suggestion would be to watch the lectures again with what I said above in mind. Prof Ng said everything I said above in the lectures, other than the “minibatch” issue. He’ll discuss that in Course 2.

1 Like