Optimal Document Text and Image Processing

JeffU · March 1, 2024, 11:03pm

Hello, in a recent client engagement, we were building out a RAG implementation leveraging product installation and maintenance documents as input to help field service technicians resolve installation and maintenance issues before calling a human “expert” for help. The images in the documents ranged from low to high complexity including tables, flowchart, photos, and CAD drawings. For the PoC we decided to exclude any image processing and just focused on chunking the text and storing embeddings in a vector db. My question is what effective approaches have you taken to process low/medium/high complexity images such that in a semantic search, results are passed to the LLM with document and page number references to both text and image hits are display to the user after the completion? Thank you for your help!

Topic		Replies	Views
RAG Methodology AI Discussions ai-discussions , data-centric	0	114	December 4, 2023
Using RAG to teach a (image + text) scanned PDF doc to the model Building Multimodal Search and RAGs	0	367	May 18, 2024
🌟 New Course! Enroll in Building Multimodal Search and RAG News and Announcements short-course	5	308	May 15, 2024
docAnalyzer - chat with large PDF dataset AI Discussions ai-discussions , careers , project	4	224	February 9, 2024
Introduction to RAG AI Discussions ai-discussions , langchain , rag	2	63	July 10, 2024

Optimal Document Text and Image Processing

Related topics