Week 2 Logistic Regression Cost Function

Is Gradient descent or maximum Likelihood estimation used to find the optimal model parameters for minimising the cost function of Logistic Regression?

In the end of week 2 video,NG talks about maximum likelihood hence I am confused now

Gradient Descent and Maximum Likelihood Estimation are completely different things. The important point is that you need to choose a “cost” or “loss” function to measure the quality of the answers that your Logistic Regression system (or later your Neural Network) gives you. The choice for Logistic Regression is to use the “cross entropy” loss function which is a mathematical expression of the concept of Maximum Likelihood Estimation from statistics.

Now that you have chosen a good “loss” function, the question is how you go about minimizing that function. The answer is that you use Back Propagation which drives Gradient Descent to modify the parameters to give a better solution. You run many iterations of Gradient Descent with a well chosen “learning rate” to converge to a better and better solution.

So Gradient Descent is a general method for iteratively approximating a good solution to minimizing a particular loss function by using the Gradients (derivatives) of the cost function with respect to the various parameters of your model (the w and b values in the case of Logistic Regression).

1 Like

@naina_dwivedi hey :wave:

Actually where the minimize loss (cost) comes is to maximize the likelihood of features of sample space or more detaily find the best distribution of them.

cs229-notes1 (3).pdf (227.7 KB)

Here :point_up: is a good reference. After reading through this you can almost understand a couple of insight of ML even DL from ground.

Hopefully, it helps.:grin::grin::grin:

Thank you Paul for such detailed explanation.

Could we say that MSE, RMSE,cross-entropy are the mathematical representation of the maximum likelihood framework?

Thank you Chris , much appreciated :slight_smile:

No, each of those cost functions measures something different. RMSE and MSE are mathematical expressions of the notion of Euclidean Distance, which is a completely different thing than Maximum Likelihood. For that reason you would never use RMSE or MSE for a classification problem, but they would be a good choices for a “regression” problem, meaning that your network is predicting some continuous real valued number, like a house price or a stock price or the air temperature.

Maximum Likelihood Estimation is a very specific method in Statistics that has to do with estimating parameters of probability distributions. In a classification problem, the output of the network looks like a probability distribution, but it does not look that way in a “regression” problem of the type I described above.

Of course you could also observe that Logistic Regression is a misleading name: it’s not a regression problem in the technical sense that I am using “regression” above. I’m not sure what the background of this confusing terminology is.