Training loss for IFT goes up

I played with the IFT hyper-params. I set max_step=5 and epochs=2. Also, I got a few more examples into the filtered training dataset (499). I noticed that the training loss goes up between steps (goes down too). Also, I did not see epoch 2 run. Can someone explain?

I tried halving the learning rate. That seemed to make the training loss less likely to go up, but it still did go up after 5 steps. I’m trying to understand what’s going on.

Loss will vary from batch to batch, it will fluctuate its normal. At the end of the epoch an aggregate will be calculated.

To train these large models many epochs and a lot of data is needed so training for just a few epochs will give out lukewarm results.

That makes sense - Thanks!

Any idea why it didn’t seem to get to epoch 2? What is the default batch size? Is max_step for one epoch or overall (across all epochs)?

From documentation * max_steps (int, optional, defaults to -1) — If set to a positive number, the total number of training steps to perform. Overrides num_train_epochs. In case of using a finite iterable dataset the training may stop before reaching the set number of steps when all data is exhausted.

That might be a reason of you early stopping.
You better have a look on the documentation for details: