Batch normalization vs regularization

I’m currently going through the batch normalization lectures and it seems to me a bit like a dirty hack. I saw the note about batch normalization having a regularization effect, but I expect the opposite is true also. There must be a way to use regularization to achieve batch normalization. One approach could be to use the distribution of activations as a diagnostic for sub-optimal regularization. Rather than adding extra parameters to the model, you could actually auto-tune the regularization hyper-parameter(s) based on the activations so actually reduce the number of parameters in the model. I’m pretty sure this would work, although I’m new to NNs. I’m interested if this type of approach (using regularization) is used or if there is a reason not to do this.

Hi StuHaze,

Welcome!

Thanks for posting this question. It seems interesting. Well yes, batch normalization will seem like an offshoot hack, but they do a proper job too :slight_smile: Here’s a thread that will clear your query to some possible extent.

Regularization is the answer to overfitting. It’s a technique that improves model’s accuracy and also prevents the loss of important data caused due to underfitting. Regularization adds penalties to the parameters and avoids them weigh heavily.