The sigmoid is \sigma(z) = \frac{1}{1 + e^{-z}} with z = w·x + b, so subtracting 0.5 is unnecessary since the function already outputs 0.5 when z equals 0, and adjusting b is sufficient for aligning with features like tumor size.
Also, why not use tanh that is centered at zero and its a symmetric alternative?
All I wanted to say is that shifting activation functions is unnecessary because they already represent probabilities correctly. Adjustments to the decision boundary should be managed through good tuning of weights and biases, not by changing the inherent properties of the activation function.
If you still think that you didn’t get your answer, just remove the solution label from my reply, so others can explain it differently!
Why? What if b = 0.5 such that z = w.x , then is your statement about it passing through (0, 0.5) still valid? If not, what is the necessary condition(s) for your statement to be valid?
is a linear regression prediction model where w and b are determined by iterating until a best-fit straight line is found using the tumor size data set.
So the prediction model must attain a value of 0.5 as the output values of the training set are either 0 or 1.
The sigmoid function maps the weighted sum of inputs, z=w⋅x+b, to a value between 0 and 1, which is then interpreted as a probability. This is all that’s needed for classification tasks, as the sigmoid function naturally centers at 0.5 when z=0.
Adding or subtracting constants like 0.5 from z isn’t necessary because the function’s behavior can already be adjusted through tuning of weights and bias. Adjustments like these are enough to align the sigmoid’s output with the data and define decision boundaries.
As I said, the sigmoid function maps z to a range between 0 and 1, and this output is interpreted as a probability. For example:
When z=0, \sigma(z) = 0.5.
When z is positive and large, \sigma(z) approaches 1.
When z is negative and large (in magnitude), \sigma(z) approaches 0.
Here, z itself doesn’t need to attain a value of 0.5 because the sigmoid function handles this mapping naturally. The values of w and b are optimized during training so that the predictions match with the labels in the dataset.
As you might know, logistic regression models the probability of a binary outcome using the sigmoid function. The raw prediction is computed as:
z = w \cdot x + b
where x is the input feature (for example: tumor size), w is the weight, and b is the bias. However, instead of using z directly, logistic regression applies the sigmoid function:
\sigma(z) = \frac{1}{1 + e^{-z}}
This function maps any real number to the range (0,1) that it better for probability estimation.
It means the linear combination of input features in a model. Mathematically, this is expressed as:
z = w_1 x_1 + w_2 x_2 + \dots + w_n x_n + b
or:
z = \sum_{i=1}^{n} w_i x_i + b
This shows a linear function that transforms the input data into a single scalar value z , which is then passed through a non-linear function (in this case sigmoid) to produce a probability estimate.
Raw Prediction is the value z , which is computed before applying the sigmoid function. In the course, Andrew Ng describes this as the linear combination of inputs plus the bias term. It is not a probability but rather an intermediate step in computing the final classification output.