C4_W4 Reformer Chatbot Assignment - Training model from scratch

adimadan · February 28, 2022, 11:37pm

For this chatbot assignment, there is a pretrained weights file available to see the chatbot working which works well. My question is if we were to train ourselves how may steps and what other hyperparameters are needed to train the same model as the one with the pretrained weights?

Thanks,
Adi

pedroantoniak · March 5, 2022, 6:23pm

short answer:
a lot
long answer:
also a lot. If you are interested and want to see more details: [1911.00536] DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation

adimadan · March 5, 2022, 11:29pm

Thanks for the reference I am looking at it, also is there a python notebook or code where I can see the actual training parameters being used for this particular dataset (MultiWoz with 10k conversations) like one thing is evident from the current notebook is that the no. of layers in the Reformer model is 6 for inference, similarly would be helpful to know how many steps was the model trained for and also the parameters below (taken from the assignment notebook as part of the training model setup):

# use the warmup_and_rsqrt_decay learning rate schedule
lr_schedule = trax.lr.warmup_and_rsqrt_decay(
    n_warmup_steps=1000, max_value=0.01)

# define the train task
train_task = training.TrainTask(            
  labeled_data=train_gen, # The training generator
  loss_layer= tl.CrossEntropyLoss(), # Loss function 
  optimizer= trax.optimizers.Adam(0.01), # Optimizer (Don't forget to set LR to 0.01)
  lr_schedule=lr_schedule,
  n_steps_per_checkpoint=10
)

Topic		Replies	Views
C4_W4_Assignment NLP with Attention Models week-module-4	5	494	July 23, 2023
Transformer Summarizer (C4-W2) - How long was model pretrained to achieve this performance? NLP with Attention Models week-module-2	4	268	March 11, 2024
Issue in C4 W4 UNQ_C4 NLP with Attention Models week-module-1	1	618	May 21, 2022
C4_W4_Assignment, Part 5: Decode from a pretrained model NLP with Attention Models week-module-4	2	610	February 15, 2023
Week 2 Lab - what parameters to use to fully fine-tune the model? (part 2.2) Generative AI with Large Language Models ai-discussions	4	32	March 11, 2025

C4_W4 Reformer Chatbot Assignment - Training model from scratch

Related topics