Intuition for Log Loss Function

In this topic Classification with Perceptron - Gradient Descent

-yln(y^) - (1-y)ln(1-y^)

how did we reach this conclusion?

By design. It’s an equation that emphasizes making correct predictions, and greatly penalizes incorrect predictions.

it also has the nice property of being related to the partial derivative of the sigmoid() function (used when computing the gradients).

Here’s a thread about loss functions and the “cross entropy” loss function that you are asking about. That thread is from DLS, which is a more advanced series that you may want to take after MLS, so the discussion may mention things that Professor Andrew has not yet mentioned here. But at least it’s worth looking at the graphs of the natural log function between 0 and 1. As the old saying goes, sometimes a picture is worth a thousand words. :grinning_face_with_smiling_eyes:

1 Like