I’m losing 5 points because one of the test cases fails with an error in the 17th decimal place. I tried changing the sigmoid function definition, and the matrix multiplication in the forward propagation, but nothing seems to correct this one accuracy mistake.
I have received the same error, the last decimal is rounded differently.
We are doing floating point here, so you can have several ways to express a given series of operations which are equivalent mathematically, but have different rounding behavior. Here’s one contrived example:
np.random.seed(42)
m = 10000
a = np.random.randn(1,m)
z1 = np.sum(np.exp(a)/m)
z2 = 1/m * np.sum(np.exp(a))
z3 = np.mean(np.exp(a))
print("z1 = {:0.17f}\nz2 = {:0.17f}\nz3 = {:0.17f}".format(z1, z2, z3))
assert(z1 == z2)
z1 = 1.65292081707434368
z2 = 1.65292081707434346
z3 = 1.65292081707434346
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-11-3a94d76d7d93> in <module>
6 z3 = np.mean(np.exp(a))
7 print("z1 = {:0.17f}\nz2 = {:0.17f}\nz3 = {:0.17f}".format(z1, z2, z3))
----> 8 assert(z1 == z2)
AssertionError:
---------------------------------------------------------------------------
So the answers differ in the 16th decimal place, but notice that I needed to make m pretty large in order to see that effect. In 64 bit floating point the resolution is approximately 10^{-16} or 10^{-17} so rounding errors start out at that scale, but can accumulate if you are doing a serial computation and it’s not “stable”. A “stable” computation is one in which the rounding errors tend to cancel each other out and not accumulate, but not all serial computations have that property. There is a field of mathematics called Numerical Analysis that studies this type of phenomenon among others and there are precise ways to analyze and characterize this type of behavior.
I am not a student or mentor for M4ML, so I’m not familiar with this assignment. None of the M4ML mentors have responded yet, but all I can suggest is to examine the algorithm and consider whether you can think of different equivalent ways to perform the required operations.
Actually here’s an older thread with perhaps a simpler example of rounding differences and a bit more explanation.