I don’t understand the behavior of Langchain recursive text splitter. Here is my code and output.
from langchain.text_splitter import RecursiveCharacterTextSplitter
r_splitter = RecursiveCharacterTextSplitter(
chunk_size=10,
chunk_overlap=0,
# separators=["\n"]#, "\n", " ", ""]
)
test = """a\nbcefg\nhij\nk"""
print(len(test))
tmp = r_splitter.split_text(test)
print(tmp)
Output
13
['a\nbcefg', 'hij\nk']
As you can see, it outputs chunks of size 7 and 5 and only splits on one of the new line characters. I was expecting output to be [‘a’,‘bcefg’,‘hij’,‘k’]