why do we need sigmoid function and fit the boundry function z(x) into the sigmoid function? if the sigmoid function is just used to check if z(x) greater than 0 or less than 0, we can just use {output = 1, if g(x) > 0; output = 0, if g(x) < 0}
Use of sigmoid function allows the network to introduce non-linear relationship into the model, which allows the neural network to learn more complex decision boundaries.
Like in image classification tasks, the sigmoid can be used to convert the output of the linear model into a probability. creating an output = 1, will create a bias as we are looking for probability.
take a look at this question, although it has low threshold of 0.2 we look for tumor suspicion based on the lower range as one does not want to miss out a tiny pathology. So when put an output = 1, we tend to create bias in such algorithm.
where as see the below image here the sigmoid function turns a regression line into decision boundary for binary classification. where in if x1 + x2 is less than 3 (for this image only as decision boundary can vary depending on the criteria of algorithm) y=0 and if x1 and x2 is greater than 3 y =1
Regards
DP
If you’re just talking about making predictions after you have trained a logistic classifier, then yes, you don’t really need sigmoid for that. It’s a monotonic increasing function.
But for training, you need the sigmoid() when computing the gradients of the logistic cost function.
hi, i think for gradients and loss function can bu used without sigmoid function. For example the model’s prediction for the {output = 1, if g(x) > 0; output = 0, if g(x) < 0} is 1, the actual result is 0, then we can just use sumed square error to compute the loss function and then compute and update gradient (w and b). I think Deeptl_Prasad might be right, sigmoid function can introduce some non-linear property to the function (the loss is smoother, like 0.2, not that steep like 0 or 1)
Hi, I tried to use the bare metal {output = 1, if g(x) > 0; output = 0, if g(x) < 0} without sigmoid function with sumed square error for gradient descent, I think in theory it could work, but the loss function is very steep and hard to convergence
It has several non-convex regions - that is why we don’t use the squared error cost function for classification.