KIE Key value extraction information

Hello everybody!
I need help in extracting data from semi-structured documents.
What models are there that are multimodal?
I know LayoutLMv3, but I would like someone to show me a list of models with the same tasks?

On the other hand, if there is any link or help to be able to perform this task but locally, as with RAG, with vector databases, without having to perform fine-tuning.

Thank you very much for answering my question.

