Need more clarity on Constitutional AI

I have couple of queries.

  1. With respect to the Generative AI life cycle where does exactly Contitutional AI fit in? I am sure it is part of Adapt and Align stage, but want to know more specifically where does it fit in? A process flow diagram right from pretraining to prompt tuning or different flavours of fine tuning to application integration, incorporating elements like Constitutional AI, PPO etc., would be very useful.

  2. How do we incorporate Constitutional policies - as prompt engineering/ few shot examples?

Request clarification.

As far as my above query is concerned, after the 3rd week lab, PPO part is clear. But, the Constitutional AI portion is still not very clear. A comprehensive diagram showing the lifecycle along with process flow diagram will help.

You can use Constitutional AI in two stages:

  • Fine-tuning: use red teaming and the revised constitutional responses to generate data for fine-tuning your LLM
  • RLAIF: similar to RLHM, but with AI.

To fine-tune your model you should:

  • First, you ask your model in ways that will generate harmful content. This process is called red teaming.
  • Create a new prompt with the Constitutional AI rules and the previous step’s harmful content, and ask the LLM to reflect if it follows the rules.
  • The LMM will return a new completion pointing out why it failed the Constitutional AI rules.
  • Finally, you take the original prompt, the explanation of why it fails the rules, and ask the LLM to s
2 Likes

Thanks for the info. The response seems to be truncated. Wish to see the missing portion.

I will rephrase my response.

You can use Constitutional AI to solve two problems related to refining a model:

1. Creating Training Data for Fine-Tuning

Training data is not always easy to get. You can synthetically create training data for fine-tuning using red teaming and constitutional response.

2. Training the Reward Model for RLAIF

RLAIF is similar in concept to RLHF (Reinforced Learning from Human Feedback).

Training the reward model necessary for RLHF may require massive human feedback. To solve this scaling problem, you replace human feedback with Constitutional AI feedback. It uses the Constitutional AI to select the best answers from red-teaming prompts to train your Reward Model.

1 Like