In the Week2 assignment, this operation is performed after conv2d;
Batch Norm - 64 filters - 7x7 filter size - (2,2) stride.
Output says it has 256 parameters, but how did we get that number? If we consider beta and gamma as the learnable parameters per filter, shouldn’t we get 64*2=128 parameters?