Crescendo Harmful Content Attack

What is Crescendo Harmful Content Attack?

The Crescendo Attack is a multi-turn strategy designed to incrementally lead AI models to produce harmful content through a series of small, seemingly harmless steps. This method takes advantage of the model’s tendency to prioritize recent inputs, follow established patterns, and trust its own generated text.

Stay updated with
the Giskard Newsletter