Why do we need recursive splitter to split all the text into chunks before using token splitter? Can’t we directly use token splitter to split whole text into chunks of context window? What is the benefit in former method?
I have the same question!
I’ve been thinking about this. Here’s my take. The presenter noted SentenceTransformersTokenTextSplitter() has an input context window max of 256 tokens. He used the RecursiveCharacterTextSplitter() to break the text into 1000 words, which will average out to about 256 tokens. So that’s what I’m thinking until someone corrects us
but what I would think is better since he notes character text splitting loses sentence meaning… why not split the text first with NLTKTextSplitter() and maintain the sentence meaning then token split?