Calculation of partial derivative of the cost function for logistic regression

paulinpaloalto · February 19, 2025, 6:22pm

I don’t know the material in MLS. This does get mentioned in DLS C1 Week 2, although you have to follow an offered link to a Discourse thread that covers it.

As Wendy says, Prof Ng is focussed on new material, so it’s not likely that he would revise the lectures to mention this in MLS. But perhaps they could just add a Reading Item to the appropriate week of MLS similar to this one in DLS C1 W2:

conscell · February 21, 2025, 2:18pm

Hi @ai_is_cool,

Well spotted! However, in the machine learning community it is common practice to assume that \log x = \ln x = \log_e x.

ai_is_cool · February 21, 2025, 7:21pm

Hi @conscell,

I think it would be useful to make it clear in Prof. Ng’s course that this is a convention in ML circles before presenting it for this first time.

Nevermnd · February 22, 2025, 4:06am

@ai_is_cool I fear I am much less ‘pure maths’ enabled as the other guys, but I highly suspect this comes from Claude Shannon’s classic works on Information Theory (from which arrives the ‘entropy loss’ concept). I am still learning but him and William Weaver’s volume is, I think, worth revisiting.

ai_is_cool · February 22, 2025, 9:25am

Hi @nevermind,

Thanks for your reply however it’s not clear to me how your reply addresses my issue.

Also, I don’t recall William Weaver being mentioned by Prof. Ng so far in Course 1.

Nevermnd · February 22, 2025, 10:59am

I usually like to provide (site) agonistic links to texts but in this case I can’t find one. But it is Shannon’s ‘The Mathematical Theory of Communication’. I am not sure of William Weaver’s role, and I am not surprised Prof. Ng’s lack of mention of this text.

A modern Economist class mentions supply/demand-- But they don’t require to read about the ‘butcher, brewer and breadmaker’ of Adam Smith (or Marx, on ‘labour-power’, for that matter). It is kind of ‘assumed’, though I read both.

In any case, I thought you’d really like this text. As far as I know this gives you your outline of where ‘cross-entropy’ (i.e. log loss) ‘comes from’, and there is plenty of mathematical detail there, thus I felt it should help.

ai_is_cool · February 22, 2025, 11:56am

Thanks but I am still having difficulty understanding how your contributions to this thread address my logarithm issue in Prof Ng’s video lesson.

Deepti_Prasad · February 22, 2025, 1:02pm

hi @ai_is_cool

Some part of mathematics is not explicitly mentioned by Prof.Ng probably he understand people would understand the differentiation of natural lgorithm and common logarithm

where derivative of ln(x)=1/x
derivative of log(x)=1/x ln (10)

I don’t know remember in MLS he mentions or not, but we surely forward your feedback to staff. Please remember that deep learning and machine! learning assumes, some basic math understanding on the learner part. But surely not defending by stating this as I can understand it can confuse some. whenever I had such issue, I used to look for more explanation and answer even outside the course.

Really appreciate your deep insight to look into machine learning from mathematical perspective. Discussion are good way to improving our knowledge.

Hope my hand writing and explanation isnt confusing in the pic

Regards
DP

ai_is_cool · February 22, 2025, 1:59pm

Thanks DP for your explanation and working out by hand.

However, I think the point I was trying to make is that the common logarithm log(x) is to the base-10 which is the logarithm Prof. Ng uses in his video lesson. However, others on here have said that what he actually means is the natural logarithm ln(x), which is confusing for a beginner student seeing his expression for the first time without an explanation to say that the natural logarithm is meant here and not the common logarithm.

Deepti_Prasad · February 22, 2025, 2:29pm

as I said earlier some part of mathematics, Prof.Ng probably understand people would understand

that when he is using ln(x) the derivatives gives the same for log(x) having the understanding between natural logrithm and common logarithm.

The probable reason behind using natural logrithm is because mathematical properties of the natural logarithm gives simple and more elegant solutions when dealing with exponential relationships and derivative calculations, especially in the context of optimization algorithm.

So base e aligns more with natural algorithm and inherent exponential.

Remember as I said earlier these part of derivative calculation was explained more in detail DLS specialisation, which I am sure will tickle your mind more in case you will be taking deep learning specialisation.

Your suggestions are appreciated and will be notified to concerned staff to atleast add some basic explanation about this into the course.

Regards
DP

ai_is_cool · February 22, 2025, 3:28pm

Hi DP,

I’m not really understanding what you are saying.

ai_is_cool · February 22, 2025, 4:44pm

The partial derivatives are not the same between using log(x) and ln(x) if by log(x) you mean log_{10}(x).

Deepti_Prasad · February 22, 2025, 5:11pm

You are again missing the point, where prof.ng is using natural logarithm instead of common logrithm as the derivative of both tend to be similar.

TMosh · February 22, 2025, 6:25pm

Last thing I’m going to say on this matter:

Andrew uses log() to indicate the natural log.
All of machine learning for classification uses natural logs.
Machine learning notation does not use ln() for the natural log.

Insisting that Andrew is wrong will not change these facts.

ai_is_cool · February 23, 2025, 1:30pm

Hi @TMosh,

Thank you for your reply in this matter.

I’m sorry to hear that you think I am “…insisting that Andrew is wrong…”.

Nothing could be further from the truth. I just think Andrew could have made it a bit clearer in his video presentations that when he uses the terminology of the common logarithm log(x), he actually means the natural logarithm ln(x).

It can be a little confusing for people like me coming into machine learning for the first time and seeing log(x) expressions and thinking “…ok, so this is a base-10 logarithm…” when in fact Andrew means a base-e logarithm.

I just think it would enhance the educational value of his presentations if he had made a comment about how equations using the common logarithm nomenclature log(x) in machine learning actually means he is using the natural logarithm here.

Best wishes,

Stephen.

ai_is_cool · February 23, 2025, 1:32pm

Hi @Deepti_Prasad,

Please explain what the point is that I am “missing” again.

Thanks for your reply.

The partial derivatives are similar but not the same owing to the factor \frac {1}{ln(10)}.

Deepti_Prasad · February 23, 2025, 8:20pm

i explained the same in the hand written explanation in my previous response, why we could use ln(x) or log(x) for the partial derivative.

Also log (x) = 1/x ln(10 and not 1/ln(10)

ai_is_cool · February 23, 2025, 8:54pm

Sorry but I’m still a little confused over the point you are trying to make.

Perhaps you can try re-phrasing your point?

Deepti_Prasad · February 24, 2025, 6:36pm

@ai_is_cool

ln(x) is the natural logarithm, meaning it has a base of “e”: This is denoted by “ln(x)” = log base e (x)

log(x) usually refers to the common logarithm, meaning it has a base of “10”: This is denoted by “log(x)” = log base 10 (x).

Conversion between ln and log

To convert “ln(x)” to “log(x)”: Use the change of base formula:
ln(x) = log(x) / log(e)
To convert “log(x)” to “ln(x)”: Use the change of base formula:
log(x) = ln(x) / ln(10)

When to use ln(x) instead of log(x)

Usually when there is natural connection with the exponential function “e^x”, using “ln(x)” often simplifies calculations and leads to cleaner result.
if a problem clearly mentions a base of “e”, then use “ln(x)”

remember also log(x) is not always base 10, as in general it could be any rational number other than 1 like in my hand written I mention it as a and in @conscell comment it is mentioned as e as loge(x)

Wendy · February 25, 2025, 7:07pm

All,
I think we can be done with this topic now. I submitted a request to staff to add a note in the first lab where log is used and they’ve already made the change. There is now a nice bullet point in Optional Lab: Logistic Loss that explains that the notational convention is that log means the natural log.

Topic		Replies	Views
Derivative of the Cost Function in Logistic Regression Supervised ML: Regression and Classification week-3	2	512	August 24, 2023
Why the linear regression and classification have identical Gradient Function? Supervised ML: Regression and Classification week-2	5	695	February 11, 2023
What's the usage of J(w,b) for logistic regression? Supervised ML: Regression and Classification week-3	18	693	June 9, 2024
Misunderstandings On The Analytical Equations of GD In Logistic Regression Supervised ML: Regression and Classification week-3	3	467	January 8, 2023
Gradient Descent for Logistic Regression: Need Help with intuition Supervised ML: Regression and Classification week-3	1	533	August 3, 2022

Calculation of partial derivative of the cost function for logistic regression

Related topics