The Dense output has a linear activation when no activation is specified i.e. it’s wx+b. To get the output as a probability for a dense unit, the activation should be sigmoid.
Here’s the derivation:
Start with the logit definition.
Let L = logit(p) i.e. the predicted outcome.
and p = probability of output = 1
L=ln(\frac{p}{1-p})
\implies e^L = \frac{p}{1-p} , after raising both sides to power of e
\implies (1 - p) * e^L = p
\implies e^L - p * e^L = p
\implies e^L = (e^L + 1) * p
\implies p = \frac{e^L}{e^L + 1}
\implies p = \frac{1}{1+\frac{1}{e^L}} , after dividing both numerator and denominator by e^L
Which is the same as sigmoid.