Tokenization in post-training slide

mwinter · November 2, 2025, 8:37am

Hi team,

I have a question regarding the following slide:

I got confused about the statement “train tokenizer” in the context of post training. Is that really a thing? Wouldn’t that alter existing token IDs and thus make the existing embeddings unusable? I know that it’s always possible to manually add new tokens to a tokenizer / LLM, but the term “training” confuses me. If it refers to a specific technique, I would appreciate if you could provide a brief reference. Thank you!

Topic		Replies	Views
Question tokenizer PEFT training Generative AI with Large Language Models week-module-2	3	197	May 1, 2024
The uses of Tokenizer Generative AI with Large Language Models week-module-1	1	386	October 2, 2023
Training Process lesson - Why Tokenize two times Finetuning Large Language Models	5	178	August 28, 2023
About packaging data: NEXT SENTECE Pretraining LLMs	0	42	July 19, 2024
Question about Week 1 Tokenizer Advanced Computer Vision with TensorFlow week-module-1	1	527	March 4, 2022

Tokenization in post-training slide

Related topics