Neural network doesn't need to do regression?

in Course-1, I learned linear and logistic regression, in Cpurse-2 about neural network, I realize the neural network only calculates the sigmoid score, not does any regression, is it right?

This is a duplicate of your previous question. I answered on this thread.

1 Like

I re-posted it to the correct course.
Thanks for the explanation, I understand, there should be some kind of regression fitting at the exit layer, but I just don’t see it in the codes, what I saw is just the sigmoid function call and ends.

The sigmoid value is the output when your function is doing a binary classification. The sigmoid value is interpreted as the probability that the answer is “yes” to the binary classification.

As I mentioned on the other thread Logistic Regression is not a “regression” in the sense you mean. It is a binary classifier.

My intuition was that one has to do some kind of gradient descent to define the boundary line, by which to make the classification, here I interpret that you are saying that sigmoid value is enough to make the classification. What part of my intuition is wrong?

I think we are talking at cross purposes here. Gradient descent is required to train whatever model you have, regardless of whether it is a classifier or a regression. But gradient descent uses the derivatives of the cost function, which will be different in the two cases. The cost function measures the output (prediction) of the function versus the label values (the correct answers) that you are using to train the function.

The sigmoid value is the answer (the output value) of the actual prediction function in the case that your function is a binary classifier.

Isn’t the sigmoid value the intermediate value used to calculate the least squares and enables the gradient descent to eventually reach the fit? My original question is that how can the neural network make the classification only by sigmoid value without further steps. sigmoid value is only the y prediction of the x, without comparing to the true y value, right?

Yes, the way the training works is that we calculate the sigmoid values from the training data. Those are the \hat{y} prediction values, meaning the answers provided by the current version of the model. Then we compute the cost or loss value by applying the Cross Entropy loss function to the predicted \hat{y} values versus the correct y values (the labels). Then we can use the gradients (derivatives) of that Loss Function to compute better weight values for the model (whether it is Logistic Regression or a Neural Network).

Note that we don’t use the least squares cost function if the goal of the model is classification. We use MSE (Mean Squared Error) for models that do regression (computing stock prices or temperatures or some other kind of output number). But for either binary or multi-class classifiers, we use the Cross Entropy Loss Function, which is based on logarithms.

Please note that I am not a mentor for MLS, so I don’t know how these ideas are presented in MLS. But I’m guessing it is similar to how they are presented in DLS, which is a more advanced series you would take after MLS.

My guess is that Professor Ng explained all this in the lectures and it might be a good idea to either wait and watch those or watch them again, if you’ve already watched them and feel that these questions were not addressed.

Hello all,

Let me provide some inputs regarding the content of MLS.

Yes, we covered linear regression and logistic regression in Course 1.

Some terminology here before I continue.

Even though the word “regression” appears in both, we actually call “logistic regression” a classification task to distinguish it from “linear regression” which is a regression task.

Now, for course 2, yes, we mostly, if not only, use classification task to exemplify the idea of neural network. We used binary classification example (which uses sigmoid), and we used multi-class classification example (which uses softmax).

However, this does not mean that neural network cannot perform regression task. It can, only that I cannot recall where this course had demonstrated one. So, please don’t let this course give you an impression that neural network does not do regression task. It does do regression task.

Then, that would be very likely that you were seeing a binary classification example. Btw, if you share which lab you were doing, we can confirm it for you.

I think Paul has covered the overall idea very well. I would just like to confirm some of the ideas in your posts and share some references to the course.

Yes, you are right to say that, for a binary classification task, the sigmoid value is the prediction of x. Now, we need to ask ourselves, are we talking about the “prediction mode” or the “training mode”?

If it is the prediction mode, then that’s it - sigmoid value, period.

If it is the training mode, we, as you said, compare the sigmoid value (the prediction) and the true y value by the loss (which is binary crossentropy loss, as Paul pointed out), and then, as you said again, we do gradient descent to update the weights in the neural network.

I think the video “Training Details” in Course 2 Week 2 pretty well summarized what I said above. Note that, anything red on the slide was added by me for this discussion.

But, again, as I pointed out at the beginning, we might be only using classification task as example in these lectures, however, neural network can also do regression task.

Lastly, @wge6729, as I followed your posts, if I did not misunderstand you, I think you had already got the idea pretty well. I would suggest you to take Paul’s replies as how we could describe neural network training using the common terminology which the lectures and all of us here use. If we all stick to the same terminology, our conversation would be most effective. :wink: :wink:

Cheers,
Raymond

1 Like