Why sigmoid continous value is passed to the next layer, not the step value?

I see the [0, 1] value is passed to the next layer but it is considered as binary on-off. That would be valid to say if we pass \{0, 1\}

z = x@w + b
a = g(z)
a = step(a, thres=0.5)
step(a) = \begin{cases}1, \quad a \ge 0.5 \\ 0, \quad a \lt 0.5 \end{cases}

The threshold comparison with >= 0.5 is only used if you’re trying to output a prediction from the last layer.

For the hidden layer outputs, there’s no threshold used.

2 Likes

Hi @tbhaxor I just want to add to the previous answer that for the hidden layers we passed the raw prediction without threshold, so it can be 0.3, 0.4, 0.12 and so on, but at the end we create a threshold on the last layer to make a final prediction, this depends on your problem and the output that you need.

In addition to the clear explanation by @TMosh and @pastorsoto focusing on the output layer, I think that in any layer, if it should learn to produce a high a value, then a = 0.999 could imply a higher model quality than a = 0.501. Rounding them both to 1 with the step function will remove their difference which is a piece of useful information.

Raymond

1 Like

Yes it makes sense now, it is like we need to get extent of activation in the hidden layer. Like model is asking tell on number line how much you think this feature contributes to the actual labels.