Week 2 Logistic Regression Cost Function

naina_dwivedi · March 26, 2022, 12:07am

Is Gradient descent or maximum Likelihood estimation used to find the optimal model parameters for minimising the cost function of Logistic Regression?

In the end of week 2 video,NG talks about maximum likelihood hence I am confused now

paulinpaloalto · March 26, 2022, 12:31am

Gradient Descent and Maximum Likelihood Estimation are completely different things. The important point is that you need to choose a “cost” or “loss” function to measure the quality of the answers that your Logistic Regression system (or later your Neural Network) gives you. The choice for Logistic Regression is to use the “cross entropy” loss function which is a mathematical expression of the concept of Maximum Likelihood Estimation from statistics.

Now that you have chosen a good “loss” function, the question is how you go about minimizing that function. The answer is that you use Back Propagation which drives Gradient Descent to modify the parameters to give a better solution. You run many iterations of Gradient Descent with a well chosen “learning rate” to converge to a better and better solution.

So Gradient Descent is a general method for iteratively approximating a good solution to minimizing a particular loss function by using the Gradients (derivatives) of the cost function with respect to the various parameters of your model (the w and b values in the case of Logistic Regression).

Chris.X · March 26, 2022, 9:19am

@naina_dwivedi hey

Actually where the minimize loss (cost) comes is to maximize the likelihood of features of sample space or more detaily find the best distribution of them.

cs229-notes1 (3).pdf (227.7 KB)

Here is a good reference. After reading through this you can almost understand a couple of insight of ML even DL from ground.

Hopefully, it helps.

naina_dwivedi · March 26, 2022, 2:50pm

Thank you Paul for such detailed explanation.

Could we say that MSE, RMSE,cross-entropy are the mathematical representation of the maximum likelihood framework?

naina_dwivedi · March 26, 2022, 2:50pm

Thank you Chris , much appreciated

paulinpaloalto · March 26, 2022, 3:43pm

No, each of those cost functions measures something different. RMSE and MSE are mathematical expressions of the notion of Euclidean Distance, which is a completely different thing than Maximum Likelihood. For that reason you would never use RMSE or MSE for a classification problem, but they would be a good choices for a “regression” problem, meaning that your network is predicting some continuous real valued number, like a house price or a stock price or the air temperature.

Maximum Likelihood Estimation is a very specific method in Statistics that has to do with estimating parameters of probability distributions. In a classification problem, the output of the network looks like a probability distribution, but it does not look that way in a “regression” problem of the type I described above.

Of course you could also observe that Logistic Regression is a misleading name: it’s not a regression problem in the technical sense that I am using “regression” above. I’m not sure what the background of this confusing terminology is.

Topic		Replies	Views
Why logistic regression is not used to calculate gradient descent Supervised ML: Regression and Classification week-module-3	8	336	May 8, 2024
Explanation of Logistic Regression Cost Function Neural Networks and Deep Learning coursera-platform	2	615	June 21, 2021
Can someone explain to me how we got the convex loss function? Neural Networks and Deep Learning coursera-platform	4	1026	June 30, 2021
Week 2 : Logistic Regression Cost Function Video Neural Networks and Deep Learning coursera-platform	1	635	May 7, 2021
NLP C1_W1 Math Derivation in Cost function NLP with Classification and Vector Spaces week-module-1	1	584	August 22, 2022

Week 2 Logistic Regression Cost Function

Related topics