Loading markdown from file for splitting

jperedo · February 21, 2024, 7:33pm

Just in case someone is playing with loading the markdown from a file instead of from a text variable in the Document splitting section (Context aware splitting):

The Markdown loader in langChain (UnstructuredMarkdownLoader) removes the markdown characters needed for splitting the text in the example (e.g.: #, ##, ###). So the splitting does not work.
Instead use the plain text loader, which will load the file as is and not remove anything.

loader = TextLoader(path)
    data = loader.load()
    markdown_document = data[0].page_content

elirod · February 21, 2024, 8:59pm

Hi @jperedo

Welcome to the community.

Thanks for reporting this

adougall · July 14, 2025, 10:28am

Just so anyone else comes across this post, the LangChain Markdown Loader (UnstructuredMarkdownLoader) can preserve elements if used with mode=“elements” option. It is a case of RTFM.

Topic		Replies	Views
Document Splitting LangChain for LLM Application Development	1	209	October 5, 2023
Data cleaning supported with DocumentLoaders? LangChain: Chat with Your Data	0	72	July 17, 2023
DirectoryLoader and Chunks LangChain: Chat with Your Data	0	213	October 5, 2023
Textloader generating RuntimeError LangChain for LLM Application Development	0	112	June 14, 2023
L2 RecursiveCharacterTextSplitter behavior changed LangChain: Chat with Your Data	2	207	September 29, 2023

Loading markdown from file for splitting

Related topics