Is Gradient descent or maximum Likelihood estimation used to find the optimal model parameters for minimising the cost function of Logistic Regression?

In the end of week 2 video,NG talks about maximum likelihood hence I am confused now

Is Gradient descent or maximum Likelihood estimation used to find the optimal model parameters for minimising the cost function of Logistic Regression?

In the end of week 2 video,NG talks about maximum likelihood hence I am confused now

Gradient Descent and Maximum Likelihood Estimation are completely different things. The important point is that you need to choose a â€ścostâ€ť or â€ślossâ€ť function to measure the quality of the answers that your Logistic Regression system (or later your Neural Network) gives you. The choice for Logistic Regression is to use the â€ścross entropyâ€ť loss function which is a mathematical expression of the concept of Maximum Likelihood Estimation from statistics.

Now that you have chosen a good â€ślossâ€ť function, the question is how you go about minimizing that function. The answer is that you use Back Propagation which drives Gradient Descent to modify the parameters to give a better solution. You run many iterations of Gradient Descent with a well chosen â€ślearning rateâ€ť to converge to a better and better solution.

So Gradient Descent is a general method for iteratively approximating a good solution to minimizing a particular loss function by using the Gradients (derivatives) of the cost function with respect to the various parameters of your model (the w and b values in the case of Logistic Regression).

1 Like

@naina_dwivedi hey

Actually where the minimize loss (cost) comes is to maximize the likelihood of features of sample space or more detaily find the best distribution of them.

cs229-notes1 (3).pdf (227.7 KB)

Here is a good reference. After reading through this you can almost understand a couple of insight of ML even DL from ground.

Hopefully, it helps.

Thank you Paul for such detailed explanation.

Could we say that MSE, RMSE,cross-entropy are the mathematical representation of the maximum likelihood framework?

Thank you Chris , much appreciated

No, each of those cost functions measures something different. RMSE and MSE are mathematical expressions of the notion of Euclidean Distance, which is a completely different thing than Maximum Likelihood. For that reason you would never use RMSE or MSE for a classification problem, but they would be a good choices for a â€śregressionâ€ť problem, meaning that your network is predicting some continuous real valued number, like a house price or a stock price or the air temperature.

Maximum Likelihood Estimation is a very specific method in Statistics that has to do with estimating parameters of probability distributions. In a classification problem, the output of the network looks like a probability distribution, but it does not look that way in a â€śregressionâ€ť problem of the type I described above.

Of course you could also observe that Logistic Regression is a misleading name: itâ€™s not a regression problem in the technical sense that I am using â€śregressionâ€ť above. Iâ€™m not sure what the background of this confusing terminology is.