I played with the IFT hyper-params. I set max_step=5 and epochs=2. Also, I got a few more examples into the filtered training dataset (499). I noticed that the training loss goes up between steps (goes down too). Also, I did not see epoch 2 run. Can someone explain?
I tried halving the learning rate. That seemed to make the training loss less likely to go up, but it still did go up after 5 steps. I’m trying to understand what’s going on.
From documentation * max_steps (int, optional, defaults to -1) — If set to a positive number, the total number of training steps to perform. Overrides num_train_epochs. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted.
That might be a reason of you early stopping.
You better have a look on the documentation for details: