Cost Function of Machine Learning Algorithm

WajidHassan · November 25, 2022, 6:29am

What if the initialization of weights(w) is at the vertical line of the Cost Function J ? That means possibly there is no tangent and denominator of slope is undefined. Is this case even possible? If yes how is going to converge.

TMosh · November 25, 2022, 8:40am

Real cost functions would not have a vertical slope.

Moaz_Elesawey · November 25, 2022, 3:46pm

Hi, @WajidHassan

As @TMosh said, cost functions will not have this vertical slope in real situations. and also remember that computers don’t work with continuous variables like in pure mathematics you will always end up discretizing your variable in a finite set of numbers. so what you get as undefined in mathematics will be just a very large number in the computer that is still bad but the algorithm will keep running.

import numpy as np
from math import tan, pi
np.tan(np.pi/2) #1.633123935319537e+16
tan(pi/2)       #1.633123935319537e+16

aachandler · November 26, 2022, 8:27pm

Hi Wajid,

This is a great line of thinking and considering such things is very much worthwhile.

Situations similar to the one you pointed out are possible in principle (not a literal vertical line, as by definition that would not be a function, but an asymptote could happen in principle). However, you should keep in mind that you, the engineer, are the one who gets to choose how to define your cost function. So to avoid problems like this, you would want to define a cost function which does not have such a property. For example, something like J(x) = \mathrm{tan}(y-f(x)) would be a bad choice because the tangent function has a vertical asymptote (e.g. goes to infinity when y-f(x)= \pi/2). This is part of the reason the common cost functions are defined the way they are. For example if you are using the square error cost function J(X,y) = \sum_{I=1}^m (f(x^{(i)}) - y^{(i)})^2 this does not have asymptotes. Cost functions like the square error, or the cross entropy or binary cross entropy are constructed as they behave well and do not have issues like the one you pointed out. We like to use cost functions which satisfy nice properties like convexity or at least being Lipschitz continuous. Lipschitz continuity implies there are no asymptotes.

I hope this helps!

Best,
Alex

WajidHassan · November 28, 2022, 2:47pm

Got it. Thank you @aachandler @Moaz_Elesawey @TMosh

Topic		Replies	Views
[Question/Validation] Negative J(w,b) in the lecture photo Supervised ML: Regression and Classification week-module-3	7	333	October 13, 2023
Minimizing the cost function question Supervised ML: Regression and Classification week-module-2	24	1033	July 15, 2022
Optional Lab: Feature scaling and Learning Rate , slope value Supervised ML: Regression and Classification week-module-2	3	495	August 14, 2022
Gradient Descent Negative Values Supervised ML: Regression and Classification week-module-1	3	54	July 25, 2024
Mathematical proof for the cost function Supervised ML: Regression and Classification week-module-1	3	697	June 21, 2022

Cost Function of Machine Learning Algorithm

Related topics