How to apply NLP tools like tokenization and embedding on other languages apart from English?

My doubt is how can I apply NLP tools like tokenization, embedding etc on other languages apart from English?

Hi @Geetanjali_Srivastav

If you want to use the already trained models like BERT (for transfer learning, fine-tuning, etc.) you have to use the tokenization procedure they used.

But if you want to train your own model, you can create your own tokens, specific to your language. There are different ways of doing that, but one of the most popular is Byte pair encoding (BPE).
Here is an example of how to do that. And here you could find out more about tokenization impact on Turkish language model.

Cheers