I have seen in all the learning algorithms be it simple machine learning or neural nets, it start with calculating the line equation y = \vec W \cdot \vec X + b and then second function is applied on it. For example in the linear regression we can it is no function, which mathematically can be written as f(y) = 1 \cdot y, in the logistic regression that second function is the sigmoid function, which can be written as f(y) = \frac{1}{1+e^{-y}}.
I always wonder what is so special in this formula, why start with fitting line always? Is it because that it is easy to apply a function on line to make it non linear. Or in other words, like hitting it with functions and bending it to any curved decision boundary is easy than to start with curved boundary and then using tools to make it straight)
Also in the following distribution,
It is clear we need a circular decision boundary here, so can we start with using the circle equation ( x - h )^2 + ( y - k )^2 = r^2. In this case we would have 3 parameters to learn h, k and r. Also if we center the data around origin, then the function need to learn only r, the equation in that case would be x ^2 + y^2 = r^2