Decision boundary vs prediction f_wb in logistic regression

neural_ghost · October 22, 2022, 9:46pm

Hi,
I struggle to understand how decision boundary is different from sigmoid function.
In the example
f_{\vec{w},b}(\vec{x})=g(w_1x_1 + w_2x_2 + b) where w1 = 1, w2 = 1, b = -3,
x1 + x2 = 3 so the decision boundary is a straight line on a graph of x2 plotted against x1, f(x1) = -x1 + 3. But g(z) is a sigmoid function. So how the heck are we talking about a linear equation? I am utterly missing the abstraction transition here.

If I am to calculate what is the probability that a point is within the boundary of f_wb(x), and this probability is expressed with a sigmoid function 1/(1+e^-(wx+b)) then why such formula applies here? Where does the 1/(1+e^-(x)) come from? I don’t know how I would calculate the probability but I suppose that if I did, I’d end up with this formula.

Does sigmoid function as a probability that a point is within a certain decision boundary, apply to any prediction f_wb, no matter the vectors w and x?

The lack of calculus in the course makes me confused.

I would really appreciate if you could help me.

Juan_Olano · October 23, 2022, 12:02am

Hi @neural_ghost ! Cheer up! it can be confusing. Let me try to explain this:

First of all the easy, starting point: Decision boundary is one thing, and sigmoid is another thing.

How are they different?

To put it in simple words: The decision boundary is the ‘fence’ that separates one class from the other (imagine a physical fence that separates the 2 classes of samples, class A and not-class-A, one on each side of the fence), while the sigmoid function is the ‘tool’ that tells you if one sample is on one side of the fence, or the other side of the fence (this tool will take each sample and put it either on one side or the other side of this imaginary fence).

On another note, good news is: this is not calculus but just algebra so don’t despair.

@neural_ghost if you’d like to dig deeper on this, after understanding in simple terms how these 2 concepts are different, please don’t hesitate to reach out. I do recommend watching again the video on decision boundary and follow Dr Ng very closely, now that you know what he’s talking about.

Cheers,

Juan

TMosh · October 23, 2022, 3:09am

“linear equation” doesn’t refer to its shape when you plot it.

It refers to the equation using the linear combination of the features and weights.

The sigmoid() is only an activation function, which re-scales (non-linearly) the output to a range limited set of values between 0 and 1.

Calculus is only used in computing the equation for the gradients of a cost function (via the partial derivative). As the math is too advanced for the target audience for this course, the derivations aren’t included. They are easy to find online though.

rmwkwok · October 23, 2022, 3:56am

Hello Adam @neural_ghost,

Agreed.

It is not linear

You can draw the line x_1 + x_2 - 3 = 0. We call the line a boundary line because it separates the space into two sides, right? And the two sides are actually

anything above the line, or x_1 + x_2 - 3 > 0, and
anything below the line, or x_1 + x_2 - 3 < 0

Equivalently, we say it a boundary line that separates any (x_1, x_2) points that satisfies z>0 from any other points that satisfies z<0.

Since they are inequalities, they are not two lines, but two “spaces”.

We know that:

when z>0, g(z) > 0.5
when z<0, g(z) < 0.5
when z=0, g(z) = 0.5

Therefore, if a point (x_1, x_2) stands on the boundary line, its g(z) should be evaluated to 0.5. If another point stands quite next to the boundary line but above the line, then g(z) is slightly larger than 0.5. Furthermore, if a point stands far above the boundary line, then its g(z) of such point is much closer to 1.

g(z) = \frac{1}{1+e^{-z}} provides a good functional form to convert unlimited z to a range between 0 and 1. For example, as z is very very large, g(z) is just 1 and it can’t ever be larger than 1. To bound an unbounded z, we need a non-linear g(z) to do the job.

Cheers,
Raymond

neural_ghost · October 23, 2022, 8:50am

Hmm, alright. I thought that the equation of sigmoid function is derived analytically by calculating the integral of z.

Ok so my question then would be, why sigmoid function - yes it fits and fulfils the needs of rescaling z in a way that is useful. But how did the mathematicians come up with it for logistic regression? As in, what is the analytical explanation of the exact formula 1/(1+e^-(wx+b))? It is a huge deal to me to understand the basics well.

Also thank you all for answers!

Kic · October 23, 2022, 10:45am

Hi @neural_ghost ,

Sigmoid function outputs a probability distribution between 0 and 1. Logistic Regression is a binary classifier; it cares about whether something is “true or false”. For example, if your model is a cat classifier, using Sigmoid as an activation function in the output layer would help to identify if an image is “cat or not-cat”.

Raymond gave a very good explanation on g(z) = 1/1+e-^z. The formula 1/1+e^-(wx+b) is just setting z = wx+b.

neural_ghost · October 23, 2022, 1:51pm

Alright, but a similar effect could be achieved with different functions, I would suppose some combination of logarithmic and root functions. And sigmoid function itself has many types with similar properties, that differ in formulas. So I am wondering if the exact formula g(z) = 1/(1+e^{-z}) is the most precise one in all circumstances, analytically.
Ok but the properties suit the purpose and the formula is very simple so no doubt it’s useful.

ahmedelgohry02 · October 23, 2022, 3:22pm

great explanation thank You

Christian_Simonis · October 23, 2022, 5:57pm

@neural_ghost:

As some additional remark to the very good answers: the popularity of the logistic function practitioners lies in the effectiveness and simplicity for classification purposes, since it is:

differentiable (w/ non negative derivative)
bounded
defined for all real numbers as input
serving w/ numerical benefits in NN layers, …
a nice way to interpret the dimensioned threshold (corresponding to a probability) in combination with some other metrics:
ROC-Kurve – Wikipedia

As a side note: (fitted well) the logistic function can also serve as an „easy to compute“ approximation of the integrated gaussian probability distribution function (which describes a normally distributed feature). You might find this older article worth a read: A Sigmoid Approximation of the Standard Normal Integral:

Most probability and statistics books, […], present the normal density function with the standard normal transformation and give a tabu- lation of cumulative standard normal probabilities. Reference is commonly made to the fact that the probabilities are obtained by integrating the normal density function. However, because the integration of the normal density function cannot be done by elementary methods, various approxi- mations are used to determine cumulative standard normal probabilities.

See also: logit - Is the first derivative of the logistic probability function a Gaussian function? - Cross Validated

rmwkwok · October 24, 2022, 2:05am

I don’t disagree that similar effect could be achieved with different functions, and therefore I also wouldn’t say sigmoid has to be the most precise one. Speaking of preciseness in the lack of prior knowledge about the problem is not quite wise. However, if you have background in statistical mechanics or information theory, you may want to google these keywords “lagrange multiplier” and “principle of maximum entropy” which will lead you to discussion that proves softmax (deducible to sigmoid) is the solution distribution which “maximize the system’s entropy” or “is the solution that without additional prior information”. This does not mean softmax (or sigmoid) is the most precise answer, but it means softmax is a good default to use if you know nothing more of the system.

Some references for you, please google more for yourself:

neural_ghost · October 24, 2022, 6:01pm

Amazing! Thank you so much!

Topic		Replies	Views
Sigmoid function & Decision Boundary Supervised ML: Regression and Classification week-module-3	8	386	August 25, 2023
Why is the sigmoid function's z term equal to "w*x+b" in logistic regression? Supervised ML: Regression and Classification week-module-2	9	490	January 7, 2025
Week 3 decision boundry sigmoid function Supervised ML: Regression and Classification week-module-3	5	552	August 8, 2023
Is the logistic regression formula not slightly biased? Supervised ML: Regression and Classification week-module-3	2	24	July 28, 2024
Why sigmoid function is called probabilistic function? Unsupervised Learning, Recommenders, Reinforcement week-module-2	8	906	January 29, 2023

Decision boundary vs prediction f_wb in logistic regression

Related topics