The uses of Tokenizer

Dineth_Jayasinghe · October 1, 2023, 3:47pm

Hi folks! I have a question that came up during my first week of the Generative AI with LLM course.

Once we have selected the tokenizer to train the model, we must use the same tokenizer when we generate the text. Let’s assume that we have a pre-trained model (e.g., GPT-3). This model is already trained with OpenAI Embeddings. We have a task to fine-tune this model with our own data, and we use HuggingFace Embeddings.
Is it okay to use HuggingFace Embeddings for the fine-tuning process, or should we use OpenAI Embeddings?

gent.spah · October 2, 2023, 6:46am

Is the tokenizer the same in both cases, if not there will probably be differences on how they tokenize the sentences, if not it should be OK. I dont how similar those 2 embedding techniques are but if not similar they could have a different impact on the overall performance of the model.

Topic		Replies	Views
Using pre-trained tokenisers and Embedding layers NLP with Attention Models week-module-1 , week-module-3	6	267	April 12, 2024
Is the tokenizer a model? Generative AI with Large Language Models week-module-1	1	493	September 8, 2023
C5 W4 Lab 2 and 3, tokenizer Sequence Models coursera-platform	1	548	August 9, 2021
Question tokenizer PEFT training Generative AI with Large Language Models week-module-2	3	192	May 1, 2024
How to handle new Tokens Generative AI with Large Language Models week-module-2	3	367	September 8, 2023

The uses of Tokenizer

Related topics