Hey all
I’m trying to build a small model (7B/14B) to handle hotel booking on a specific website via HTML/DOM understanding (not screenshots). I’ve tried a Puppeteer + MCP setup, but the agent keeps drifting out of context mid-task.
Wondering:
-
Is this more of a finetuning problem (on action traces + HTML) or a post-training one (needs RL/SFT for stability)?
-
I’ve seen Mind2Web dataset, but it’s general web navigation. Are there any tools or frameworks to generate synthetic datasets for specific web tasks (like hotel booking), or do people usually build their own crawler/data-recorder setup?
-
Any model/dataset recs for DOM-based web agents ?
Basically: how do you get a small model to stay on-task through multi step HTML flows?
Thanks!