C2W2 Graded Quiz Question 1

emissaryprog · February 21, 2023, 7:41pm

In the first question of the graded quiz we are asked what is true about gradient descent. One of the possible answers is “It only works for differentiable functions” which is accepted as true. However in the videos and practice lab we have function e^x - log(x) and its derivative is e^x - 1/x. We can see that the derivative does not exist when x = 0, but still we do apply gradient descent.
Why is it so? Does gradient descent work for non-differentiable functions or do we apply gradient descent where we should not?

Titus_Teodorescu · February 22, 2023, 12:53am

The function e^x-\log(x) is not defined at x=0. The function is differentiable everywhere where it’s defined.

bs80 · February 22, 2023, 2:33am

Can you provide a link to the videos or lab work where we apply gradient descent despite an undefined gradient? My access to the material in the course is now limited but perhaps with the appropriate context I can still try to help.

bs80 · February 22, 2023, 2:53am

As for your original question, technically speaking, the update formulas based on gradient descent are by definition dependent on a differentiable value, so if the (partial) derivative does not exist, then technically there is no solution to gradient descent in theory.

Again, I dont have access to the material any longer, but if, for instance, you are referring to a relu activation function that is technically undefined precisely at x=0, then in practice, carrying forward treating the derivative as either 0 (as is the case when x<0) or 1 (as is the case when x>0) as the derivative will be fine in either of the following two scenarios:

A) you find yourself in the practically impossible scenario where you find yourself in a situation where x precisely equals zero. If you do, then arbitrarily pick a side (i.e. choose either 0 or 1 as the slope of relu at x=0). After an iteration of gradient descent, it is almoat guaranteed that you will not have to deal with the same x=0 scenario the following iteration.

B) x does not precisely equal 0 so you dont need to worry about the undefined derivative at x=0.

Point is, whether you end up going with the derivative being 1 or 0 when x=0, you will still end up converging to the same value.

I may have missed some points here trying to explain this, but I’m open to hearing what you guys have to say. And I absolutely encourage anyone that understands this stuff better than I do to chime in, especially if my explanation is off, because I’d love to learn from you guys as well!

emissaryprog · February 22, 2023, 8:59am

So basically a function can be undefined at some points, or even over an interval, but still differentiable where it is defined, and that’s enough for gradient descent? On the other hand, if a function is defined on some interval, but its derivative is not defined on the same interval - such a function is not differentiable and gradient descent wouldn’t work. Is this correct?

emissaryprog · February 22, 2023, 9:04am

Here is a video: https://www.coursera.org/learn/machine-learning-calculus/lecture/daSiv/optimization-using-gradient-descent-in-one-variable-part-2

It uses the formula I mentioned in the original post, but according to @Titus_Teodorescu hints I might get it wrong. See my understanding so far in the comment right above this one.

bs80 · February 22, 2023, 2:16pm

Ok thanks for that link. For some reason i was under the impression that you were talking about a function that was defined everywhere, but with a derivative that was undefined at a specific point, such as the relu activation function.

As @Titus_Teodorescu pointed out, f’(x) is defined everywhere that f(x) is defined. Therefore, gradient descent will work for the defined parts of the function. Gradient descent needs a derivative value in order to iterate to the next step, so yes, the function needs to be differentiable over that region in order to use gradient descent over that region.

jbloom · May 22, 2024, 6:19pm

I came here with the same question. Now that I’ve read the thread, I understand I was confusing a function’s differentiability with “difficult to optimize.” To summarize:

A differentiable function is one whose derivative exists at each point in its domain.
The e^x - log(x) IS a differentiable function, and the derivate is e^x - 1/x
The e^x - log(x) is a hard to optimize, so we use gradient descent.

Topic		Replies	Views
What if the cost function or target function is not differentiable? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	584	November 23, 2021
Week 1 quiz lesson 2 Calculus for Machine Learning and Data Science week-module-1	1	418	June 15, 2023
Course 2 -- Week 1 -- Numerical approximation of gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	564	June 28, 2021
A doubt when applying gradient descent on logistic regression Supervised ML: Regression and Classification week-module-3	2	424	June 3, 2023
Error in Quiz1 of Existence of Derivative video for Course1 Calculus for Machine Learning and Data Science week-module-1	3	389	June 28, 2023

C2W2 Graded Quiz Question 1

Related topics