Real world scenario using sigmoid as an activation function

Hello Kosmetsas,

If you want your output to be between 0 and 1, it’s a sufficient argument to use sigmoid at the last layer (this echoes the statement “The sigmoid is best for on/off or binary situation”). If you want to transform your input to be between 0 and 1, it’s sufficient to add sigmoid right after your input layer, however, it’s better to do it as a feature engineering step because you will then only need to do the sigmoid transformation once.

If your features are already binary, being binary itself, in my opinion, isn’t sufficient for us to use sigmoid in any of the layers. Otherwise, we would just min-max normalize any continuous features into ranging between 0 and 1, and stick with sigmoid forever and perhaps we don’t need to invent ReLU or other activations.

Bringing in non-linearity is an important reason for us to use ReLU or sigmoid, or other activation functions other than the linear activation.

Bringing in ReLU has its significance and I compared ReLU and sigmoid in this thread, including a reference to Professor Ng’s DLS video on Activation function. Let me know if any of the points there needs more clarification.

Raymond