Using pre-trained tokenisers and Embedding layers

HuggingFace Question
Can I use a tokenizer trained for a model (can be BERT, RoBERTa etc.) and use the pre-trained Embeddigs learned for GPT-2?

If yes, how can one do that?

Hi @Amit_Bhartia

In general no, you cannot. You can use them per se, but you would have to retrain the entire GPT-2 model on these tokens which would not make much sense (randomly initialized values would probably as good as these). Also the tokens are usually optimized for the models they were trained, so using BERT tokens for GPT2 even with full retraining should be sub-optimal.

Cheers

1 Like

Hi @arvyzukai,

Thanks for quickly clarifying!

I was thinking of using the GPT4 tokenizer with the BERT embedding as the GPT4 embeddings are paid, hence my query.

Any ideas for that?

Best Regards,
Amit

Not a good idea because they are completely different architecture.

Why you don’t want to use BERT tokenizer?

Wanted to try out a pre-trained sub word tokenizer. And the one from GPT4 has been vastly improved. Wanted to see how that impacts training of my model.

It very much depends on the dataset that you have.

Some tokenizers are better than others out of the box (considering wide use cases) but you would have to retrain the entire model to see if it actually “improved” for the model you have.

Retraining the entire BERT model with a different tokenizer would be too costly and I would guess the improvement (if any) would not be worth it.

Agreed. Thanks for the guidance! :raised_hands:

1 Like