I implemented the conv_forward function in the first week 1 programming assignment two ways: looping over examples and filters (as instructed) and a vectorized version that only loops over rows and columns.
My loop. version got 100%. The vectorized version does not. I investigated and discovered that the values in the Z array sometimes identical, but often are different by a tiny amount, up to 2 x 10^(-14).
My question is: why would a vectorized implementation and loop implementation lead to very slightly different numbers? In both cases, I do elementwise multiplication of W and a slice of A_prev_padded, sum it, and add b. The only difference is that the vectorized version broadcasts the multiplication over all examples and filters while in the loop version, I have to loop over all of them.
Hello @akePete, I can’t tell you the exact reason just from your problem description, but I can suggest some checks:
identify all pairs of intermediate points at which the loop version & the vectorization version should produce the same result, and see which pair first see a slight difference. To do so, you may just focus on the steps that produces Z[0, 0, 0, 0] (if you know that this value is different among the two versions).
Examine all operations involved. The loop version used +, *, np.sum, and float() (in conv_single_step). I suppose you didn’t use float()? What are other operations only used by vectorization? This comparison may give you some hints.
The discrepancies in the last digits of floating point, double precision. So very small.
My vectorization took advantage of broadcasting along two axes: a) the Weight array broadcast along the examples dimension of the Activation array. b) Also, the Activation array broadcast along the number of filters dimension of the Weight array.
When I simplified my inputs so that there was one example and 1 filter, all calculations matched precisely.
When I added multiple examples to my Activations array, all calculations matched precisely.
Wben I added multiple filters to the Weight array, that introduced the tiny discrepancies mentioned above. So it seems that something is behaving oddly when the Activation array broadcasts along the filters dimension of the Weight array. I solved the problem by reducing the amount of vectorization. So while I still calculate all examples at the same time, I loop through the filters.
Do you have any idea why this sort of discrepancy arose in one direction of broadcasting–but not in the other?
This is quite common. When dealing with limited precision real numbers, any change in the order of operations or the number of rounding and truncation cycles can create differences that are around the minimum representable value (eps).
Everything we do in floating point is an approximation and any change in the order of the operations can result in slightly different values. Here’s a thread that gives another example in which mathematically equivalent algorithms can yield different floating point results.