Calculation of partial derivative of the cost function for logistic regression

ai_is_cool · February 17, 2025, 1:00pm

Why shouldn’t Andrew be using ln(x)?

Unless he uses ln(x) there will be a factor of \frac{1}{ln(10)} missing.

ai_is_cool · February 17, 2025, 1:05pm

I tend to focus on the mathematics being correct and not intuition.

ai_is_cool · February 17, 2025, 1:39pm

No, the derivative will always be less than 0 for an output label y = 1 even when the error is 0.

Think of the function -\frac{1}{ln(10)m}log(f(\vec w, b)), which is the Loss function for predicting an output label y = 1. When there is zero error, f(\vec w, b) will equal the value of the output lable y and the Loss function passes through the horizontal axis at (1, 0) but its gradient at that point is still less than 0.

rmwkwok · February 17, 2025, 2:46pm

I didn’t say he shouldn’t. I am not sure that he should.

As for how to write down log_e(x), I agree that ln(x) is clear, but I don’t think we can’t use log(x) to implicitly mean log_e(x).

I didn’t say it will not be less than 0.

I said as the error tends to zero, the first derivative of cost w.r.t. weight tends to zero. Your result has shown this.

As for the sign of the derivative, as your result shown, it will also depend on the x_j^{(i)} term.

rmwkwok · February 17, 2025, 2:58pm

Since you mentioned about “…passes through the horizontal axis…”, seems that you also focus on geometrical argument

ai_is_cool · February 17, 2025, 3:31pm

Hi Raymond,

Not so much geometry but evaluating the results of the derivative and the error.

Please explain why you think Andrew shouldn’t be using ln(x)?

paulinpaloalto · February 17, 2025, 3:39pm

No, that is not a mistake on Andrew’s part. You have to realize that in the ML world, the notation is different than in the math world. Here log always means natural log, not log base 10.

Anytime you’re going to be taking derivatves, it would be nuts to use base 10 logs, exactly because you get all the useless constant multipliers propagating everywhere: it just creates a mess and adds no value in terms of behavior.

rmwkwok · February 17, 2025, 3:44pm

Hey Stephen, I cannot explain that because I do not think he should not. As I said, he could. Besides, I do not know how he made that decision.

Trust me, when I first learned about logarithm, log was log_{10}. Even I am not a math expert, I can see why, if I were you, I could choose to insist that. However, practically, I also always see people stick to other convention.

I have just tried to look for some reference and found this Encyclopedia of Mathematics page:

I just think the whole world does not have a consensus to exclude log(x) from meaning log_e(x).

Raymond

paulinpaloalto · February 17, 2025, 3:47pm

This has been discussed many times. Here and here and more …

ai_is_cool · February 17, 2025, 3:48pm

I think to make it clear and avoid ambiguity the language of mathematics should be used universally and consistently in all application domains.

If Andrew meant that log(f(\vec w, b)) was to the base-e then it would have been better to use the nomenclature ln(f(\vec w, b)) as this notation is universally understood to be a logarithm to the base-e.

paulinpaloalto · February 17, 2025, 3:51pm

Well, I’m sorry, but mathematicians don’t control the world, as much as they really should, since math is the basis of everything in science. Different fields that use math don’t always use the same notation. Econometrics uses math, psych uses math, ML uses math. Prof Ng uses the notation that is common in the ML community. In terms of these courses, he’s the boss, so he gets to choose the notation and we just have to understand it. When you publish your own course, you can do it your way.

Don’t get me wrong: my academic background is all in math, so I don’t disagree that math should define everything. But the ML folks didn’t ask my opinion.

So the short answer is “now you know, so deal with it”.

ai_is_cool · February 17, 2025, 3:53pm

I’m not sure what you mean by “…deal with it”.

paulinpaloalto · February 17, 2025, 3:54pm

Just move on. You can’t change the conventions of the whole ML field, so it’s not worth spending any more mental energy on it.

TMosh · February 17, 2025, 4:00pm

I foresee a future of endless frustration for you.

ai_is_cool · February 17, 2025, 4:12pm

Actually I am learning a lot about machine learning and the application of mathematics to solving real world problems in this domain and I really enjoy it.

Wendy · February 18, 2025, 12:13am

I’m glad you’re enjoying it and not getting too frustrated!

Just to prepare you, with respect to the log function, it’s not just Prof. Ng and the ML community that uses base e as the default. If you look up the Python documentation for the log function, you’ll notice that base e is the default there, too.

paulinpaloalto · February 18, 2025, 4:25am

That’s a great point. The numpy log functions are:

numpy.log for natural log
numpy.log10 for base 10 log
numpy.log2 for base 2 log

And MATLAB does it exactly the same way: log, log10 and log2. They probably did it first so we can blame Cleve Moler.

So maybe that’s where the ML conventions come from …

ai_is_cool · February 18, 2025, 8:52am

Hi Paul,

Thanks for that however I do think that the people who created the numpy.log function could have named it numpy.ln or numpy.loge to make it clearer that the base-e is being used here to compute the logarithm.

ai_is_cool · February 18, 2025, 9:26am

Actually, I’m not too frustrated.

It’s just a bit disappointing that Prof. Ng doesn’t make it explicit that he is using ln x when he writes the loss function with log(x).

Perhaps someone could reach out to him and ask if he could update his course to say something about when he uses log(x) he actually means he is using ln(x).

That would be useful for future students who have no knowledge of the convention in ML of using log(x) to mean ln(x).

Stephen.

Wendy · February 19, 2025, 5:46pm

With all the ML/AI companies and projects Prof. Ng is involved in, it’s not really practical to get him back to re-record for something like this. I expect it was a deliberate choice anyway. He puts a lot of thought into how to present concepts so that they are intuitive for the broad range of students who take this course. The way it is now gets the main intuition across and is consistent with the terminology students will be seeing in code and in the ML community in general.

It’s a hard balance to strike to keep the main intuitions clear and to provide enough depth to give a basic understanding without going so deep that you lose a large chunk of the students. You’ll probably find other places where you would have liked more mathematical rigor or more depth up-front. Just keep in mind the balance he’s trying to strike. Often, you’ll get more depth later as the course proceeds and in future courses.

Topic		Replies	Views
Week 2 Logistic Regression Gradient Descent video: derivative mistake Neural Networks and Deep Learning coursera-platform	5	597	March 19, 2023
Minor error in video - Course 1, Week 2 Neural Networks and Deep Learning coursera-platform	3	537	March 15, 2022
How to get the derivatives of the logistic cost/loss function [TEACHING STAFF] Supervised ML: Regression and Classification week-module-3	18	3520	May 9, 2024
[Week 2][Quiz] Logistic Loss Neural Networks and Deep Learning coursera-platform	4	617	January 14, 2024
Help on derivatives for the loss function Neural Networks and Deep Learning week-module-2 , coursera-platform	2	801	March 24, 2024

Calculation of partial derivative of the cost function for logistic regression

Related topics