Optimal Number of Tokens

avinash1229 · February 8, 2024, 12:51pm

Hi guys,

According to the Scaling laws and compute-optimal models from week 1
Scaling Laws and compute-optimal models

In this lecture, it was mentioned that “The optimal number of tokens required/preferred according to chinchilla research paper is 20 times the model parameters” What I would like to understand is whether this is applied only for the pre-training? Where you train a model from scratch, or is this also applicable for finetuning? Sorry if this is a baseless question Just trying to understand how I can leverage the Chinchilla Research for a translation model that I am building.

Hoping to get a response.

Thanks!!

gent.spah · February 8, 2024, 2:08pm

This is referring to the training phase of model itself, before it is deployed to make predictions!

avinash1229 · February 8, 2024, 3:29pm

Thank you @gent.spah

avinash1229 · February 8, 2024, 3:31pm

Do you have any recommendations for a translation model? I would like to use an existing model to fine-tune this model with internal translation data. @gent.spah

gent.spah · February 9, 2024, 6:08am

Not really but check on the web or tensorflow hub!

avinash1229 · February 9, 2024, 7:53pm

@gent.spah Sure, Thank you. Also, one other question could you please tell me what sort of cleaning techniques I would have to apply to clean Translation data? Or if you could point me to a resource that would be really helpful. I tried searching online but couldn’t find anything reliable.

Topic		Replies	Views
Fine-tuning and chinchilla paper Generative AI with Large Language Models week-2	0	317	November 18, 2023
Right-Sizing Models for the Dataset: Finding the Best Data-To-Parameter Ratio for NLP Models AI Discussions the-batch , ai-discussions	1	71	May 20, 2023
Does BloombergGPT contradict Chinchilla and Llama papers? Generative AI with Large Language Models week-1	4	514	July 7, 2023
On Scaling Laws and Compute-Optimal Models lecture Generative AI with Large Language Models week-1	2	444	June 30, 2023
I have a question about the content of the lecture~ Generative AI with Large Language Models week-2	3	407	September 21, 2023

Optimal Number of Tokens

Related topics