Cost Function of Machine Learning Algorithm

What if the initialization of weights(w) is at the vertical line of the Cost Function J ? That means possibly there is no tangent and denominator of slope is undefined. Is this case even possible? If yes how is going to converge.

Real cost functions would not have a vertical slope.

Hi, @WajidHassan

As @TMosh said, cost functions will not have this vertical slope in real situations. and also remember that computers don’t work with continuous variables like in pure mathematics you will always end up discretizing your variable in a finite set of numbers. so what you get as undefined in mathematics will be just a very large number in the computer that is still bad but the algorithm will keep running.

import numpy as np
from math import tan, pi
np.tan(np.pi/2) #1.633123935319537e+16
tan(pi/2)       #1.633123935319537e+16
1 Like

Hi Wajid,

This is a great line of thinking and considering such things is very much worthwhile.

Situations similar to the one you pointed out are possible in principle (not a literal vertical line, as by definition that would not be a function, but an asymptote could happen in principle). However, you should keep in mind that you, the engineer, are the one who gets to choose how to define your cost function. So to avoid problems like this, you would want to define a cost function which does not have such a property. For example, something like J(x) = \mathrm{tan}(y-f(x)) would be a bad choice because the tangent function has a vertical asymptote (e.g. goes to infinity when y-f(x)= \pi/2). This is part of the reason the common cost functions are defined the way they are. For example if you are using the square error cost function J(X,y) = \sum_{I=1}^m (f(x^{(i)}) - y^{(i)})^2 this does not have asymptotes. Cost functions like the square error, or the cross entropy or binary cross entropy are constructed as they behave well and do not have issues like the one you pointed out. We like to use cost functions which satisfy nice properties like convexity or at least being Lipschitz continuous. Lipschitz continuity implies there are no asymptotes.

I hope this helps!


1 Like

Got it. Thank you @aachandler @Moaz_Elesawey @TMosh