As some additional remark to the very good answers: the popularity of the logistic function practitioners lies in the effectiveness and simplicity for classification purposes, since it is:
- differentiable (w/ non negative derivative)
- bounded
- defined for all real numbers as input
- serving w/ numerical benefits in NN layers, …
- a nice way to interpret the dimensioned threshold (corresponding to a probability) in combination with some other metrics:
ROC-Kurve – Wikipedia
As a side note: (fitted well) the logistic function can also serve as an „easy to compute“ approximation of the integrated gaussian probability distribution function (which describes a normally distributed feature). You might find this older article worth a read: A Sigmoid Approximation of the Standard Normal Integral:
Most probability and statistics books, […], present the normal density function with the standard normal transformation and give a tabu- lation of cumulative standard normal probabilities. Reference is commonly made to the fact that the probabilities are obtained by integrating the normal density function. However, because the integration of the normal density function cannot be done by elementary methods, various approxi- mations are used to determine cumulative standard normal probabilities.