'Sticky' model not retraining after optimal result
:Colab code
I tried running through a few parameters for fun using the colab code provided and my optimizer is behaving differently for arange of thresholds than when the thresholds are initially provided. See the attached screenshot. Model prediction for a threshold of 1.02 is [[18.767128]]. When I loop with arange the outputs all ‘lock’ on the best value [[18.999973]] once it is reached, regardless of the threshold passed in.
Just curious why this would be happening - I’m at a loss
Note the reset_states() doesn’t do anything, it’s just something I was trying.
I am trying to find this assignment, which one is it exactly!
is_small_error = tf.abs(error) <= threshold
small_error_loss = tf.square(error) / 2
big_error_loss = threshold * (tf.abs(error) - (0.5 * threshold))
return tf.where(is_small_error, small_error_loss, big_error_loss)
The output calculation depends on the threshhold, maybe you should trace it there to see the difference!
Thanks for looking at it. Yes perhaps some good debugging. It seems to me that the output for say a threshold of 1.02 should be the same regardless of whether it is ‘one off’ or in a sequence however that is clearly not the case. The loss calculation code is very straightforward so my guess is that the predictions are reaching an optimal maximum, something along the lines of the predictions not being initialized - that doesn’t quite ring true but it is some kind of unexpected internal behaviour. I shouldn’t be too obsessed with it but it might be relevant if we start some kind of gridsearch - the results from a series of compile/fit iterations not matching the results once those optimized parameters are used on their own. For the benefit of others the lines preceding the is_small_error… above are:
def call(self, y_true, y_pred):
error = y_true - y_pred
“error” may be a reserved keyword. Try using a different variable name.
Aaah pretty much found it if not quite understanding the exact behaviour. Compile and fit did not fully reinititalize the model - I needed to create a new model each iteration of the loop. Something to keep in mind if manually creating some kind of custom grid search.
Good suggestion - don’t think that was the root cause here but a good tip to keep in mind.
So somewhere around 2500 epochs of training the model maxes out with the result 18.999973 - my loop was in effect just more epochs of training of the same model.
Did you try using a different loss function? As I can see your model has a single dense layer with a unit of 1, probably using a different optimizet might also lead to a different outcome. Also you have not mentioned about what parameters you are using as I can see your input shape is 1. Please share some details about the dataset you are using, what model you created and what all different ways you have tried.
Labelling and your model quote don’t match probably the reason of your error.
Regards
DP
Thanks Deepti, actually this is a simple lab to try a custom loss function. I do think that the behaviour is caused by not recreating the model so in effect we just keep training and improving. I note that pretty much regardless of other parameters the output in the first loop of the second cell will improve upon the first training cycle done in the first cell before the looping. Really just a misunderstanding on my part of how to create and test a new model - compile/fit simple do not reinitialize everything.