The Dense output has a linear activation when no activation is specified i.e. it’s wx+b. To get the output as a probability for a dense unit, the activation should be sigmoid.

Here’s the derivation:

Start with the logit definition.

Let L = logit(p) i.e. the predicted outcome.

and p = probability of output = 1

L=ln(\frac{p}{1-p})

\implies e^L = \frac{p}{1-p} , after raising both sides to power of e

\implies (1 - p) * e^L = p

\implies e^L - p * e^L = p

\implies e^L = (e^L + 1) * p

\implies p = \frac{e^L}{e^L + 1}

\implies p = \frac{1}{1+\frac{1}{e^L}} , after dividing both numerator and denominator by e^L

Which is the same as sigmoid.