Off-topic guardrails can be helpful to block malicious or playful user prompts that intend to use the LLM application in an unintended way.
To train and benchmark such guardrails, I’ve built this dataset using GPT4o
and the structured outputs. Synthetic data generation was done by seeding with real examples and random words.
Any feedback would be appreciated