Hi guys,
According to the Scaling laws and compute-optimal models from week 1
Scaling Laws and compute-optimal models
In this lecture, it was mentioned that “The optimal number of tokens required/preferred according to chinchilla research paper is 20 times the model parameters” What I would like to understand is whether this is applied only for the pre-training? Where you train a model from scratch, or is this also applicable for finetuning? Sorry if this is a baseless question Just trying to understand how I can leverage the Chinchilla Research for a translation model that I am building.
Hoping to get a response.
Thanks!!
1 Like
This is referring to the training phase of model itself, before it is deployed to make predictions!
2 Likes
Do you have any recommendations for a translation model? I would like to use an existing model to fine-tune this model with internal translation data. @gent.spah
1 Like
Not really but check on the web or tensorflow hub!
@gent.spah Sure, Thank you. Also, one other question could you please tell me what sort of cleaning techniques I would have to apply to clean Translation data? Or if you could point me to a resource that would be really helpful. I tried searching online but couldn’t find anything reliable.
1 Like