Guardrails for LLM/Multimodal-based Systems

hodgesz · April 13, 2024, 2:42pm

I’m curious what others are doing on the guardrails front for their applications to prevent harmful or toxic inputs and outputs. This could also be actions like jailbreaking, prompt injections, or other nefarious actions from users or non-safe responses from the system.

Meta’s Llama Guard (Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | Research - AI at Meta) is pretty interesting allowing you to define your own “safety taxonomy” with it — custom policies for what is safe vs unsafe interactions between humans (prompts) and AI (responses).

MS/Azure recently changed their jailbreak detection service to Prompt Shields (Prompt Shields in Azure AI Content Safety - Azure AI services | Microsoft Learn), which also looks interesting.

Anyone using these or others to protect their apps?

If so, has anyone customized to add additional items to prevent, either through the ‘safety taxonomy’, fine-tuning or other means?

nick_nikolaev · April 23, 2024, 8:15am

I am not aware of guardrails for LLM but what can possibly happen if someone prompt injects you?
Is it dangerous only if you have full RAG pipeline?
What can possibly happen if you only have an API integration with LLM?