I don’t know what is wrong with the code. I checked it several times, yet it gives me the same error.
Here are the results I get for that test cell:
With training=False
[[[ 0. 0. 0. 0. ]
[ 0. 0. 0. 0. ]]
[[192.71234 192.71234 192.71234 96.85617]
[ 96.85617 96.85617 96.85617 48.92808]]
[[578.1371 578.1371 578.1371 290.5685 ]
[290.5685 290.5685 290.5685 146.78426]]]
96.85617
With training=True
[[[0. 0. 0. 0. ]
[0. 0. 0. 0. ]]
[[0.40739 0.40739 0.40739 0.40739]
[0.40739 0.40739 0.40739 0.40739]]
[[4.99991 4.99991 4.99991 3.25948]
[3.25948 3.25948 3.25948 2.40739]]]
So you can see that the training = False values agree with yours, but they don’t in the True case. So that should be a pretty good clue where to look.
The only place that the training variable has any effect is the BatchNorm calls. They give you one example in the template code:
X = BatchNormalization(axis = 3)(X, training = training) # Default axis
Your calls to BatchNorm should look the same as that example. Do they?
The test passed . But I dont see the reason why training=false or true should be the case for every batch normalisation statement. Isnt one enough?
Apparently not. Each instance of batch normalization is independent. The coefficients are specific to that instance.