I was going through “Optional Logistic Regression: Cost Function”. When it comes to modeling the cost function for the entire dataset, it introduces the formula:
My question is why do we need to multiply (the symbol); why can’t just sum up the result of each item and divide by the item number?
That expression is not the final one. Note that you haven’t taken the logarithms yet. The log of the product is the sum of the logs.
Thank you, sir. But why don’t we directly use the sum in the expression, but instead use the logs to convert the product to sum?
They explain that in the writeup that you mention. Here’s the link. This is an implementation of the concept of maximum likelihood estimation in statistics. The product gives the overall “likelihood” that all the predictions are correct. But the problem with that way of expressing it is that as the m value goes up, the likelihood gets smaller and smaller (the product of numbers between 0 and 1 gets smaller as you multiply more of them). If you take the log of the product, it converts it into a form in which it is more clear which terms are the bigger contributors to the error and then you take the average of those values to get the overall metric for the error that is independent of the number of samples.
If you want to know more, here’s the Wikipedia page for Maximum Likelihood Estimation.