While training my model i see weights number exploding. I’m using Leaky-Relu activation function on a regression problem.
I normalised both input and output.
When trying to investigate why i see that input is quite small (naturally bound between -1 and 1) and a small std, usually around 0.2-0.3. So after normalisation the input get’s scaled up quite a lot. Variance clearly is still 1 but i’m concerned with the single “spikes”.
Is this a real issue? Is there a rule for normalising very correlated inputs?
Hello Riccardo! Interesting post.
- Is gradient exploding with unnormalized data?
- Is gradient exploding with normalized input only (without normalizing output)?
- I tried just removing the mean to the inut and no sigma scaling and things improve. It doesn’t explode anymore. To clarify it was exploding on a network with 8 hidden layers and about 900 nodes per layer, it was not doing it on smaller NN. Still though the NN is not learning, i.e. the training loss decreases until a threshold no matter how many iterations or NN size. This why i started investigating this but might be another issue.
- yes it does, even worse.