In the lecture for “Neural Style Transfer - Style Cost Function,” Ng states a couple times that the normalization factor in the correlation doesn’t matter that much because we tune a parameter we multiply by it anyway. But intuitively, it seems that in the case of lambda, the opposite is true. The choice of lambdas (there is a chosen hyperparameter lambda value for each layer) is intended to dictate the emphasis given to shallower or deeper features in a net when judging style transfer.
If we want a set of lambda values to have intuitive or generalizable meaning, wouldn’t we want them to apply to values which have been “properly” normalized? Otherwise the values of lambda wind up being heavily noised as they must account for our lack of proper normalization at each layer. Correct?
On the other hand, if lambdas are not intended or expected to have human-understandable meaning, then normalization truly doesn’t matter. Am I correct in interpreting these lectures as implying that lambdas should be comprehensible in nature?