RAG with Multimodel Text and Images

I have the Product Image and title I’m in the estimated price of the product and where the Ai (LLM) extracted the price How can I do this?

Does anyone know how can I use the multimodel