Document Splitting

While trying to create a document embedding for a PDF I was trying to split it, however I am little confused about what is the difference between the two

loader = PyPDFLoader(“docs/cs229_lectures/MachineLearning-Lecture01.pdf”)
pages = loader.load()
text_splitter = CharacterTextSplitter(
separator=“\n”,
chunk_size=1000,
chunk_overlap=150,
length_function=len
)
docs = text_splitter.split_documents(pages)

AND

loader = PyPDFLoader(fileName)
pages = loader.load_and_split()

When Implementing QA I see a lot of difference between the two

Any help is going to be helpful

Nobody answered you? I am using DirectoryLoader and noticed that it is doing chunking by default. I can’t figure out how to change the defaults. Any insights?

1 Like