Your implementation of the derivative is correct, but the s value that you feed into it is wrong. Why reimplement sigmoid manually here? You already built the function to compute that earlier, so why not save yourself some work by just calling it? But if you insist on building it again, at least compare your current implementation to the previous one. I hope the previous one is different.

Your implementation of s is:

s = \displaystyle \frac {1}{e^{-x}}

But note that is equivalent to:

s = e^x

right? That’s a little different than sigmoid, no?

One quick way to debug these is to find the values of the different variables for different values. We know that the value at infinity (or a very high value) is 1 and the derivative at infinity is 0.

By checking the function, you can see that these are not followed.

Sure, it is our goal to compute \frac {\partial J}{\partial w} and \frac {\partial J}{\partial b}, so the question is how you do that. It involves using the Chain Rule and if you work that out (as Eddy shows on the link that Saif gave us) it involves the derivative of sigmoid as one of the factors.