Can Variance-Only Normalization Ever Outperform Standard Batch Normalization?

btabari · February 4, 2026, 10:57am

Hi everyone,

I have a conceptual question about Batch Normalization that I haven’t been able to find a clear answer to, either in the lectures or in common references.

As far as I understand, Batch Normalization during training:

Computes the mean and variance per batch
Normalizes activations to zero mean and unit variance
Then applies learnable scale (γ) and shift (β) parameters

In practice, subtracting the batch mean seems essential for centering activations and improving optimization. However, this made me wonder about a more edge-case scenario:

Has anyone ever encountered a situation where it was actually beneficial to keep the bias (mean) of the data and only normalize by the variance (i.e., skip mean subtraction)?

More concretely:

Are there known tasks, architectures, or data distributions where preserving the mean helped?
Or is mean subtraction in BatchNorm essentially always beneficial, with γ and β already covering any useful bias information?
If such cases exist, are they more common in specific settings (e.g., GNNs, time-series, physics-informed models)?

I’m asking mostly from an intuition and empirical-experience perspective rather than theory alone.

Thanks in advance — I’d be very interested to hear if anyone has seen this work in practice.

Behzad

Deepti_Prasad · February 4, 2026, 1:21pm

Resnet is one of the architecture where only scaling the variance is sufficient for high performance.

usually in cases where mean is crucial and not part of noise, such heterogeneous data where mean distribution plays important significance in data distribution, also uses variance only normalisation instead standard BN

Topic		Replies	Views
Batch Normalization Intuition Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	586	November 22, 2022
C2W3 quiz - understand answer to Question 8 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	8	690	June 1, 2022
Different Mean/Standard Dev values for hidden units Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	513	July 16, 2023
Batch Normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	567	May 31, 2021
Calc variance in batch normalization Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	611	April 17, 2021

Can Variance-Only Normalization Ever Outperform Standard Batch Normalization?

Related topics