Calculation of partial derivative of the cost function for logistic regression

Why shouldn’t Andrew be using ln(x)?

Unless he uses ln(x) there will be a factor of \frac{1}{ln(10)} missing.

I tend to focus on the mathematics being correct and not intuition.

1 Like

No, the derivative will always be less than 0 for an output label y = 1 even when the error is 0.

Think of the function -\frac{1}{ln(10)m}log(f(\vec w, b)), which is the Loss function for predicting an output label y = 1. When there is zero error, f(\vec w, b) will equal the value of the output lable y and the Loss function passes through the horizontal axis at (1, 0) but its gradient at that point is still less than 0.

I didn’t say he shouldn’t. I am not sure that he should.

As for how to write down log_e(x), I agree that ln(x) is clear, but I don’t think we can’t use log(x) to implicitly mean log_e(x).

I didn’t say it will not be less than 0.

I said as the error tends to zero, the first derivative of cost w.r.t. weight tends to zero. Your result has shown this.

As for the sign of the derivative, as your result shown, it will also depend on the x_j^{(i)} term.

Since you mentioned about “…passes through the horizontal axis…”, seems that you also focus on geometrical argument :wink:

Hi Raymond,

Not so much geometry but evaluating the results of the derivative and the error.

Please explain why you think Andrew shouldn’t be using ln(x)?

1 Like

No, that is not a mistake on Andrew’s part. You have to realize that in the ML world, the notation is different than in the math world. Here log always means natural log, not log base 10.

Anytime you’re going to be taking derivatves, it would be nuts to use base 10 logs, exactly because you get all the useless constant multipliers propagating everywhere: it just creates a mess and adds no value in terms of behavior.

2 Likes

Hey Stephen, I cannot explain that because I do not think he should not. As I said, he could. Besides, I do not know how he made that decision.

Trust me, when I first learned about logarithm, log was log_{10}. Even I am not a math expert, I can see why, if I were you, I could choose to insist that. However, practically, I also always see people stick to other convention.

I have just tried to look for some reference and found this Encyclopedia of Mathematics page:

I just think the whole world does not have a consensus to exclude log(x) from meaning log_e(x).

Raymond

1 Like

This has been discussed many times. Here and here and more …

1 Like

I think to make it clear and avoid ambiguity the language of mathematics should be used universally and consistently in all application domains.

If Andrew meant that log(f(\vec w, b)) was to the base-e then it would have been better to use the nomenclature ln(f(\vec w, b)) as this notation is universally understood to be a logarithm to the base-e.

1 Like

Well, I’m sorry, but mathematicians don’t control the world, as much as they really should, since math is the basis of everything in science. Different fields that use math don’t always use the same notation. Econometrics uses math, psych uses math, ML uses math. Prof Ng uses the notation that is common in the ML community. In terms of these courses, he’s the boss, so he gets to choose the notation and we just have to understand it. When you publish your own course, you can do it your way.

Don’t get me wrong: my academic background is all in math, so I don’t disagree that math should define everything. But the ML folks didn’t ask my opinion.

So the short answer is “now you know, so deal with it”. :joy:

3 Likes

I’m not sure what you mean by “…deal with it”.

Just move on. You can’t change the conventions of the whole ML field, so it’s not worth spending any more mental energy on it.

I foresee a future of endless frustration for you.

Actually I am learning a lot about machine learning and the application of mathematics to solving real world problems in this domain and I really enjoy it.

I’m glad you’re enjoying it and not getting too frustrated!

Just to prepare you, with respect to the log function, it’s not just Prof. Ng and the ML community that uses base e as the default. If you look up the Python documentation for the log function, you’ll notice that base e is the default there, too.

3 Likes

That’s a great point. The numpy log functions are:

numpy.log for natural log
numpy.log10 for base 10 log
numpy.log2 for base 2 log

And MATLAB does it exactly the same way: log, log10 and log2. They probably did it first so we can blame Cleve Moler. :nerd_face:

So maybe that’s where the ML conventions come from …

2 Likes

Hi Paul,

Thanks for that however I do think that the people who created the numpy.log function could have named it numpy.ln or numpy.loge to make it clearer that the base-e is being used here to compute the logarithm.

Actually, I’m not too frustrated.

It’s just a bit disappointing that Prof. Ng doesn’t make it explicit that he is using ln x when he writes the loss function with log(x).

Perhaps someone could reach out to him and ask if he could update his course to say something about when he uses log(x) he actually means he is using ln(x).

That would be useful for future students who have no knowledge of the convention in ML of using log(x) to mean ln(x).

Stephen.

With all the ML/AI companies and projects Prof. Ng is involved in, it’s not really practical to get him back to re-record for something like this. I expect it was a deliberate choice anyway. He puts a lot of thought into how to present concepts so that they are intuitive for the broad range of students who take this course. The way it is now gets the main intuition across and is consistent with the terminology students will be seeing in code and in the ML community in general.

It’s a hard balance to strike to keep the main intuitions clear and to provide enough depth to give a basic understanding without going so deep that you lose a large chunk of the students. You’ll probably find other places where you would have liked more mathematical rigor or more depth up-front. Just keep in mind the balance he’s trying to strike. Often, you’ll get more depth later as the course proceeds and in future courses.

1 Like