After training the model for 5 epochs with these values, I must have made a mess of the DistilBERT’s original parameters.
Question: What is the right way to recover the model’s original parameters and start training again with a new value for layers_to_train and learning_rate? It seems to me that if I just change layers_to_train and learning_rate and try training again, I am NOT starting with the model’s original parameters but with some corrupted parameters from my 1st attempt.
That is a brute force approach, but I am looking for a way to reset the model’s weights back to the original values. Does it work to just reload the model?
My objective is to try different config values in a loop, but I need to restore the original weights each time at the start of the loop.
In that case, make a copy of the model’s original weights outside the loop, and reset the model’s weights using those.
But each time you call the “load_bert” function, it loads the model with fresh weights from the shelf. Though, you run it a few times, you will eventually run out of memory in the workspace.
The better approach would be to do something like this:
configs = [0.1, 0.01, 0.001] # Example hyperparams
for cfg in configs:
# Load a fresh model
model, tokenizer = load_bert()
# Run your experiment
# train_model(model, lr=cfg)
# Clean up RAM/VRAM before the next loop
del model
gc.collect()
torch.cuda.empty_cache()
Another recommendation would be to play around only after you have passed the assignment, so that your experiments don’t conflict with your grades.