Hello all,
Can anyone please explain me why should we still adjust our Z by Beta and Gamma?
For me our W[l] and b[l] have been computed through gradient descent (that has been accelerated through Batch Norm). But once those {W[l],b[l]} have been defined, we should not compute anymore any whatsoever Z or dW or Beta in the test phase since we have our model parameters set through the training.
I’m sure I’m missing something, hence my question.
Thanks a lot guys for your kind explanation