Bias parameter b -Fitting Batch Norm

In batch normalization why the parameter b is cancel out ?

In this post Batch Normalization : B parameter is similar with this small example: {a+k,b+k} and {a,b} have the same normalization .

But in the “Fitting Batch Norm into a Neural Network” at 8:41 why Andrew treat
b as a constant ? In a mini batch we have at a layer L Z= W*A + b, where b is a matrix right ? If we write the first row of Z we get:
[W[0]*a1+b11,W[0]*a2+b12,W[0]*a3+b13…] where W[0] is the first row of W , b’s are scalars of the first row of b and a’s are the output of the previous layer …So the idea is at component wise level it is not a constant which is added .

Does this help?

import numpy as np


# 10 observations, each with 2 features
x = np.random.random((10, 2))

# Dense layer with 3 units
weights = np.random.random((2, 3))
biases = np.random.random((1, 3))

get_z_tilda = lambda z: (z - z.mean(axis=0)) / z.std(axis=0)

z_tilda1 = get_z_tilda(x @ weights + biases)
z_tilda2 = get_z_tilda(x @ weights)
assert np.allclose(z_tilda1, z_tilda2)