Well, it may not be intuitive, but you just have to work out the math, remembering that we’re dealing with the output layer here and the activation function is `sigmoid`

.

Prof Ng shows in the lectures and it is given in the notebook that:

dA^{[L]} = - \left (\displaystyle \frac {Y}{A^{[L]}} - \frac {(1 - Y)}{(1 - A^{[L]})} \right )

Now substitute that in your second formula and remember that because of the aforementioned `sigmoid`

, we have:

g^{[L]'}(Z^{[L]}) = A^{[L]} (1 - A^{[L]})

So you can start from the fully general formula that we use in the hidden layers (as Phuc has shown) or you can use the special simplifications that you get because of the specifics of the output layer.