Hi everyone,
I’m an independent builder working on a structured “16-problem ProblemMap” for diagnosing common RAG and LLM pipeline failures.
Instead of proposing yet another framework, the goal is to provide a fixed diagnostic lens for where things typically break in real systems. The map categorizes recurring failure modes across:
-
ingestion and chunking mismatches
-
embedding and vector store inconsistencies
-
retriever ranking and recall gaps
-
evaluation blind spots
-
hallucination and guardrail leakage
-
deployment and bootstrap ordering issues
In practice, many issues that appear to be “model problems” are actually structural mismatches between components. Having a stable taxonomy has helped reduce trial-and-error debugging in my own work.
The project is maintained under the GitHub account onestardao, under the name WFGY.
The ProblemMap component has been referenced or integrated in several academic labs, RAG infrastructure projects, and curated AI research lists.
I’m sharing this here mainly to get feedback from people building or researching LLM systems:
-
Which failure modes do you encounter most often?
-
Do you find structured taxonomies useful in production settings?
-
What categories feel missing or too rigid?
Would appreciate thoughtful discussion.