Week 2 Logistic Regression Gradient Descent video: derivative mistake

In the video Logistic Regression Gradient Descent there is a mistake in the derivative of the cost function (“da”). Since the lost function use the Log function (not Ln function) the derivative must ve divide by ln(10). “da” = -( y/a + (1-y)/(1-a) )/ln(10)

The formulas shown are correct. What you need to realize is that the notational conventions are different in the ML/DL world than they are in the math world. Whenever they say “log” here, they mean natural logs, not logs base 10. I’m not sure where this convention came from, but one conjecture is that it’s based on the way things work in MATLAB, which was pretty commonly used in the early days of ML. Of course python has subsequently taken over the world. Note that np.log is also natural log. In fact python uses the same naming as MATLAB: if you want logs base 10, the function is np.log10.

Of course the point you raise also illustrates why you’d be nuts to use logarithms to any base other than e if you’re going to be taking derivatives or integrating. You just get inundated with bogus constant factors and of course the shapes of the curves are the same in any case, so you’d only get pain from using base 10 and no actual advantage.

2 Likes

Thanks for Good answer, would be nice to see it mention during the course.

I withdraw my point, there is a link in the coure to full derivation where this is noted.

It seems interesting that people continue to write it in this way. “nl()” is fewer characters to write/type than “log()”. There must be a reason why people don’t just start writing it the more mathmatically correct notation. Do you have any insight into why that is?

No, sorry, I don’t know the history of why the notation is different in the ML world than in the math world. You could argue that the math world is the one that’s backwards. Who really cares about logs base 10, once you start doing calculus? They make no sense: you get no behavioral advantage and it just makes a hideous mess with bogus constant factors at every turn.

One possible theory is that in MATLAB, which was the most common language used for ML work in the earlier days, the function names are log for natural log, log10 for base 10 logs and log2 for base 2 logs. And note that MATLAB was created by real mathematicians for doing programming that includes serious mathematics.