Creating a costome tockenizer for romanized pali text

How to start creating a custom nlp tokenizer for the romanized pali text?

Not very knowledgable in this but I think this article could be of help to you: