Hey,
first of all, thanks for the amazing course!
It is very informative and super nice to watch.
I have a question regarding some best practices for splitting the documents.
I’d like to use a sentence transformer (right now bert-based) to find the best similar chunks in a vector store.
Do you recommend a way of splitting text data in this case?
I currently try to split on a sentence level, however, that does not seem to work best.
Especially if the user query is short, the sentence long and the transformer is using a mean pooling at the end.
Do you have a tip how to split the data best?
Cheers and thanks a lot,
Philipp