HarmBench Harmful Content Attack

What is HarmBench Harmful Content Attack?

The HarmBench Harmful Content Attack is designed to evaluate AI systems by using data from the HarmBench dataset. This comprehensive benchmark assesses an AI's ability to resist generating harmful content across various categories of potential risk.

Get AI security insights in your inbox