Hello, in a recent client engagement, we were building out a RAG implementation leveraging product installation and maintenance documents as input to help field service technicians resolve installation and maintenance issues before calling a human “expert” for help. The images in the documents ranged from low to high complexity including tables, flowchart, photos, and CAD drawings. For the PoC we decided to exclude any image processing and just focused on chunking the text and storing embeddings in a vector db. My question is what effective approaches have you taken to process low/medium/high complexity images such that in a semantic search, results are passed to the LLM with document and page number references to both text and image hits are display to the user after the completion? Thank you for your help!
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
RAG Methodology | 0 | 114 | December 4, 2023 | |
Using RAG to teach a (image + text) scanned PDF doc to the model | 0 | 333 | May 18, 2024 | |
🌟 New Course! Enroll in Building Multimodal Search and RAG | 5 | 287 | May 15, 2024 | |
docAnalyzer - chat with large PDF dataset | 4 | 213 | February 9, 2024 | |
Introduction to RAG | 2 | 55 | July 10, 2024 |