I have a few questions on input data and batch normalization:

With respect to input data normalization:
Is entire input data normalized (subtract mean, divide by variance) once when doing mini batch optimization, or can each minibatch input be normalized separately? 
With respect to mini batch normalization (and making \beta^{[l]}, \gamma^{[l]} parameters to be optimized together with W^{[l]}), Prof Andrew talks about normalization for hidden layers (normalizing z^{[1]}, z^{[2]}, \ldots.
Can same be done for z^{[0]} , which is same as mini batch input X and optimize for \beta^{[0]}, \gamma^{[0]} as well?