Auto-Merging Retrieval: How does it work?

David_Hillmann · December 2, 2023, 4:41pm

The course doesn’t go into much detail on how the Auto-Merging works.
But I think it might be valuable to understand what’s happening under the hood.

1) Creating the node hierarchy
If I understood correctly, we first define the hierarchy by taking the small chunks and link them to parent chunks. How are the leaf chunks assigned?
Example: assume we have 8 leaf chunks (=indexed embeddings) and define parent to be 4 x the child with just 2 layers: leaf1, leaf2, …, leaf8. Is the leaf1 to leaf4 assigned to parent 1 and leaf5 to leaf8 to parent2 node? Or is it more of a sliding window?

2) The “merge” process
In the course it’s mentioned: "If the set of smaller chunks linking to a parent chunk exceeds some threshold, then “merge” smaller chunks into the bigger parent chunk. ". What’s the threshold applied? Is it related to the top_k or rerank_k we define to filter relevant chunks?

Any insights or thoughts would be appreciated!

DavidSBatista · July 22, 2024, 3:37pm

I’m very confused by the definition of the “merging process”

Topic		Replies	Views
L2 - Basic RAG Pipeline Chunking Strategy Building and Evaluating Advanced RAG Applications	0	298	January 30, 2024
RAG - Automerging Building and Evaluating Advanced RAG Applications	0	279	December 28, 2023
Q about mergeIntervals in Transformer Network Application: Named-Entity Recognition Sequence Models coursera-platform	1	523	July 19, 2021
Merging text from chunk window Knowledge Graphs for RAG	0	61	June 11, 2024
Document splitting: Chunksize LangChain for LLM Application Development	0	109	July 6, 2023

Auto-Merging Retrieval: How does it work?

Related topics