My understanding is that to complete the logistic regression equation we are doing forward and backward propagation. In particular we are looking to find the optimal values for *w* and *b* so that we have the best model possible basis our data to predict whether the given image is a cat or not.

Is my understanding correct? and how does the loss function fit in here?

Yes in neural networks there is a forward and backward propagation performed!

The point of back propagation is that you need to know how to push the parameters in the direction of a better solution. Forward propagation is just computing the predictions of your current model. Then you need a metric function that measures how good your current predictions are. That is exactly the purpose of the cost or “loss” function. We then use the derivatives (gradients) of that cost function w.r.t. the various parameters (the w and b values) to figure out how to push them in the direction of a better solution.

It’s been a few years since I actually sat down and listened to all the lectures in DLS C1, but I’m pretty sure that Prof Ng must have said some version of what I just said above in the lectures. If you missed that, it might be worth watching the relevant lectures again with what I said above in mind.

Just to add to what Paul said:

Keep in mind our ‘cost’ function is basically telling us a measure between our ‘predictions of Y’ based on the present weights in the forward propagation and the actual values of Y, in the cat case, our labels (cat/not cat).

A lower cost means, overall, we are getting more acturate predictions-- Which is what in the end we want. Yet as Paul says, since our cost function with any many number of features now exists in this sort of ‘hyperdimensional’ space, the question is how do we find this set of weights that brings us to that point in the space.

Thus the derivative (which mentally I like to thing of as the ‘curve’ of the function) used in back propogation suggests how we might wander (I only say this because there is also stochastic gradent decent) around until we get there.

You could kind of thing of back prop as the ‘map’ to where we want to go (lowest cost), and forward prop as taking the next step in that direction. At least IMHO.