It is a pleasure to be in this community and to see so many AI related projects. I have completed my PG Diploma in AI ML from Upgrad, but I am not still confident use the learnings, feel really ashamed.
I am very much interested in build something amazing but I am stuck right now. I am kind of trying to build a Model which can take an image of web page and it should be retrieving the list of all UI elements present in that image in a certain format which will be helpful to me.
This model will be a part of a bigger project that i want to build, so any guidance here will be of great help.
Hello @spotnuru
Are you looking forward to training the model from scratch? I would suggest you check out/try to reverse engineer this open-source project to build up your confidence though the project is using GPT-Vision API:
I took a look but there’s not much to reverse engineer in this case. This project is basically just a frontend, the actual models they use is Claude 3.7 and GPT-4o. They are vision-capable models but are proprietary. I’m building a screenshot app that specializes in identifying UI elements as well and is very interested in getting my hands on a model specially trained on this subject, surprised there isn’t one in the wild yet.