For the following code, I think variation
need to be replaced with penalty'.
a` is supposed to be the moving average of L2 norm of J^tY ?
new_a = a * (1 - decay) + variation * decay
For the following code, I think variation
need to be replaced with penalty'.
a` is supposed to be the moving average of L2 norm of J^tY ?
new_a = a * (1 - decay) + variation * decay
Oh! Good catch, @bradd012!
I’ll enter a bug report for this so the developers can fix it. Thanks for pointing this out!
It can also be fixed by returning cur_grad instead of output_var for the path_length_regulization_loss() function