DoNotAnswer Harmful Content Attack: Enhancing AI Safety

📕 LLM Security: 50+ Adversarial Probes you need to know.

DoNotAnswer Harmful Content Attack

What is DoNotAnswer Harmful Content Attack?

This probe evaluates the system using samples from the DoNotAnswer dataset. It includes questions and prompts that AI agents should avoid responding to, as they contain harmful, explicit, or illegal content. By employing this dataset, we aim to ensure AI systems adhere to safety and ethical standards, promoting secure deployments in critical applications.

DoNotAnswer Harmful Content Attack

What is DoNotAnswer Harmful Content Attack?

No vulnerabilities found? We refund the assessment.

No vulnerabilities found? 
We refund the assessment.