I would like to ask about the mu and var using for the test set.
Well, we use exponentially weighted average to predict mu and var ( base on the minibatches)
running_mean = momentum * running_mean + (1 - momentum) * sample_mean
running_var = momentum * running_var + (1 - momentum) * sample_var
But, like this, we’ll have lots of pairs (mean,var) depends on which minibatches we’re currently on .
So, with the Test set, we just use the last pairs, the pairs corresponding the last mini-batches ? Or which pairs we should use ?
Thank you, I hope you could help me with this!