Category Scores from Moderation OpenAI API

Sonam_Gupta · June 1, 2023, 11:24pm

How are we deciding the score threshold for categories in Moderation API ?

ai_curious · June 8, 2023, 1:30pm

Tl;dr

flagged: Set to true if the model classifies the content as violating OpenAI’s usage policies, false otherwise.
categories: Contains a dictionary of per-category binary usage policies violation flags. For each category, the value is true if the model flags the corresponding category as violated, false otherwise.
category_scores: Contains a dictionary of per-category raw scores output by the model, denoting the model’s confidence that the input violates the OpenAI’s policy for the category. The value is between 0 and 1, where higher values denote higher confidence. The scores should not be interpreted as probabilities.

My takeaway is the ‘threshold’ is binary. It either is flagged for one or more categories or it isn’t,

Topic		Replies	Views
Moderation - category scores - their robustness? Building Systems with the ChatGPT API	0	85	June 14, 2023
OpenAI moderation for chatbot Building Systems with the ChatGPT API	0	137	September 14, 2023
Not flagging a few prompts Building Systems with the ChatGPT API	0	68	June 8, 2024
Decision boundary lesson: quiz Supervised ML: Regression and Classification week-module-3	3	23	November 26, 2024
RAG evaluation metrics score threshold and when to use each metric AI Discussions ai-discussions	7	155	January 6, 2025