For this chatbot assignment, there is a pretrained weights file available to see the chatbot working which works well. My question is if we were to train ourselves how may steps and what other hyperparameters are needed to train the same model as the one with the pretrained weights?
Thanks,
Adi
short answer:
a lot
long answer:
also a lot. If you are interested and want to see more details: [1911.00536] DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
Thanks for the reference I am looking at it, also is there a python notebook or code where I can see the actual training parameters being used for this particular dataset (MultiWoz with 10k conversations) like one thing is evident from the current notebook is that the no. of layers in the Reformer model is 6 for inference, similarly would be helpful to know how many steps was the model trained for and also the parameters below (taken from the assignment notebook as part of the training model setup):
# use the warmup_and_rsqrt_decay learning rate schedule
lr_schedule = trax.lr.warmup_and_rsqrt_decay(
n_warmup_steps=1000, max_value=0.01)
# define the train task
train_task = training.TrainTask(
labeled_data=train_gen, # The training generator
loss_layer= tl.CrossEntropyLoss(), # Loss function
optimizer= trax.optimizers.Adam(0.01), # Optimizer (Don't forget to set LR to 0.01)
lr_schedule=lr_schedule,
n_steps_per_checkpoint=10
)