Loading markdown from file for splitting

Just in case someone is playing with loading the markdown from a file instead of from a text variable in the Document splitting section (Context aware splitting):

The Markdown loader in langChain (UnstructuredMarkdownLoader) removes the markdown characters needed for splitting the text in the example (e.g.: #, ##, ###). So the splitting does not work.
Instead use the plain text loader, which will load the file as is and not remove anything.

loader = TextLoader(path)
    data = loader.load()
    markdown_document = data[0].page_content
2 Likes

Hi @jperedo

Welcome to the community.

Thanks for reporting this