Auto-Merging Retrieval: How does it work?

The course doesn’t go into much detail on how the Auto-Merging works.
But I think it might be valuable to understand what’s happening under the hood.

1) Creating the node hierarchy
If I understood correctly, we first define the hierarchy by taking the small chunks and link them to parent chunks. How are the leaf chunks assigned?
Example: assume we have 8 leaf chunks (=indexed embeddings) and define parent to be 4 x the child with just 2 layers: leaf1, leaf2, …, leaf8. Is the leaf1 to leaf4 assigned to parent 1 and leaf5 to leaf8 to parent2 node? Or is it more of a sliding window?

2) The “merge” process
In the course it’s mentioned: "If the set of smaller chunks linking to a parent chunk exceeds some threshold, then “merge” smaller chunks into the bigger parent chunk. ". What’s the threshold applied? Is it related to the top_k or rerank_k we define to filter relevant chunks?

Any insights or thoughts would be appreciated!

4 Likes

I’m very confused by the definition of the “merging process”