Artwork recognition agent

Good morning, good afternnon and good evening everyone,

As part of a product I need to develop I need to implement a feature that given the picture of an artwork (a painting, a building, an sculpture,…) to give some information and context of the artwork.

I’m struggling to find which is the best tool to use to implement this with limited budget to achive the highest success rate, I mean we can’t build our own model or host a model…

Google Lens would be a alternative, but they don’t offer an API and the only option would be to use some exiting APIs that act as bridge or create my own scratching tool.

Another alternative I was thinking was to create a small agent to first goes to Google Cloud Vision API and with the information returned go to an LLM like, for instance, Google Gemini.

But I thought in coming to the wisdom of the community to know what you think it could be a good alternative to this.

Thank you in advance!.

Hi dmorillas,

You could try using comfyui

https ://www.reddit.com/r/comfyui/comments/1pg09qs/i_build_a_comfyui_workflow_image_to_decription/

(if you don’t trust the link just do a search and you shall find)