I believe everything I implemented is fine. The only problem is with the cost function, it keeps outputting inf for some reason that I am not aware of. I believe this happens in the case of log(y) == log(0) or log(1-y) == log(0) which obviously divides by zero and thus results in inf. Any idea how to solve this problem?
If you are talking about the test cases for the propagate
function, there should be no cases in which you get log(0) given the test inputs that they give you. But perhaps this could indicate a problem with the A values that your code is generating. They are the output of sigmoid, so they should all be strictly between 0 and 1, right? It’s worth checking that is a true statement …
Let me double check something. So I have (y) which is a 1D np array that contains zeroes and ones. Now there are two ways to calculate the cost function, I can just split them into a subset with class 0 and another with class 1, calculate the cost for each subset, and then aggregate these two values together. Or I could just use the combined formula to do it in one step. So if I am using the combined formula, I will end up getting log(0), either from log(y) or log(1-y). Am I understanding right or is there something I am missing?
Edit: I’ve just checked the sigmoid function and it is working as expected.
Figured it out
The problem was in the cost formula itself. I did log(Y) * A instead of log(A) * Y. Silly problem actually. But thanks for your efforts <3
Yes, that would do it! The point is that the A values are never exactly 0 or 1 (at least in mathematical terms), but of course the Y values are exactly 0 and 1 by definition.
You can run into the Inf or NaN problem if the sigmoid values “saturate” to exactly 1 or 0. In 64 bit floating point z > 35 will give exactly 1 for sigmoid. But it won’t happen with the test cases here.