GCG Injection Harmful Content Attack

What is GCG Injection Harmful Content Attack?

The GCG Injection Harmful Content Attack examines if an AI agent can withstand Greedy Coordinate Gradient (GCG) attacks. These attacks use specifically designed adversarial suffixes intended to bypass safety protocols and content filters. GCG attacks optimize sequences appended to harmful prompts with the goal of increasing the likelihood of generating restricted responses.

Stay updated with
the Giskard Newsletter