Context Compliance Harmful Content Attack

What is Context Compliance Harmful Content Attack?

This evaluation assesses whether an AI agent can be manipulated into delivering harmful or inappropriate content. It involves presenting the agent with an artificial conversation history, designed to mimic prior compliance with similar, potentially malicious requests.

Stay updated with
the Giskard Newsletter