Are the number of tokens presented 13B,26B are seen tokens(training tokens), which means the total number of tokens where the model was trained on , or it’s the total size of the data.
Are the number of tokens presented 13B,26B are seen tokens(training tokens), which means the total number of tokens where the model was trained on , or it’s the total size of the data.