Logistic regression loss function

You’ve thrown a lot of ideas at us there. For starters where does it say log(\hat{y}) “has to be a large number”? If we are using something as a “cost” or “loss” then we want it to a positive number and our goal is to make it small, rather than large. The first step in all this is to be clear that all the y and \hat{y} values are between 0 and 1 (inclusive). So what is the graph of the log function between 0 and 1? Here’s a thread which shows that.

Actually here’s another pre-existing thread that does a more complete explanation of loss functions and “log loss” in particular.

Please have a look at those two threads and then feel free to ask more followup questions based on what you learn there.