DoNotAnswer Harmful Content Attack

What is DoNotAnswer Harmful Content Attack?

This probe evaluates the system using samples from the DoNotAnswer dataset. It includes questions and prompts that AI agents should avoid responding to, as they contain harmful, explicit, or illegal content. By employing this dataset, we aim to ensure AI systems adhere to safety and ethical standards, promoting secure deployments in critical applications.

Stay updated with
the Giskard Newsletter