Giskard: LLM vulnerability scanner to secure AI agents

We're releasing an upgraded version of our LLM vulnerability scanner in Giskard Hub, specifically designed to secure conversational AI agents in production environments. While our open-source scanner provided basic heuristic testing with nine static detectors, this enterprise version deploys autonomous red teaming agents that conduct dynamic, multi-turn attacks across dozens of vulnerability categories covering more than 40 probes*. The new system adapts attack strategies in real-time to cover complex conversational vulnerabilities that emerge over multiple interactions.

*Probe: a structured adversarial test designed to expose weaknesses in an AI agent, such as harmful content generation, data leakage, or unauthorized tool execution.

What's new: enhanced AI Red Teaming capabilities

The upgraded LLM vulnerability scanner in Giskard Hub introduces new capabilities that go beyond basic AI security checks:

Comprehensive LLM vulnerabilities coverage

‍

The scanner covers LLM vulnerabilities across established OWASP categories and business failures:

Prompt Injection (OWASP LLM 01) - Attacks that manipulate AI agents through carefully crafted prompts to override original instructions
Training Data Extraction (OWASP LLM 02) - Attempts to extract or infer information from the AI model's training data
Data Privacy Exfiltration (OWASP LLM 05) - Attacks aimed at extracting sensitive information, personal data, or confidential content
Excessive Agency (OWASP LLM 06) - Tests whether AI agents can be manipulated to perform actions beyond their intended scope
Hallucination & Misinformation (OWASP LLM 08) - Tests for AI systems providing false, inconsistent, or fabricated information
Denial of Service (OWASP LLM 10) - Attacks that attempt to cause resource exhaustion or performance degradation
Internal Information Exposure (OWASP LLM 01-07) - Attempts to extract system prompts, configuration details, or other sensitive internal information
Harmful Content Generation - Probes that bypass safety measures to generate dangerous, illegal, or harmful content
Brand Damage & Reputation - Tests for reputational risks and brand damage scenarios
Legal & Financial Risk - Attacks that would make the agent generate statements exposing the agent deployer to legal and financial liabilities
Unauthorized Professional Advice - Tests whether AI agents provide professional advice outside their intended scope

Full list of probes we cover can be found in the here.

Business alignment

Our scanner evaluates both security vulnerabilities and business failures, automatically validating business logic by generating expected outputs from your knowledge bases to ensure agents provide accurate, contextually appropriate responses.

Domain-specific attacks

Previous tools treated LLMs as static models. AI agents now operate in dynamic environments with tool access, memory, and complex interaction patterns. To ensure realistic evaluation, we adapt our testing methodologies to agent-specific contexts, using bot descriptions, tools specification, and knowledge bases. Dynamic interaction with the agent allows us to craft targeted, more context-aware attacks.

Multi-turn attack simulation

Real-world attacks rarely succeed in a single prompt. The new LLM vulnerability scanner implements dynamic multi-turn testing that simulates realistic conversation flows, detecting context-dependent vulnerabilities that emerge through conversation history (risks that single-turn testing misses).

Adaptive AI Red Teaming

The scanner includes adaptive red teaming that adjusts attack strategies based on agent resistance. When encountering defenses, our testing agent escalates tactics or pivots approaches, mimicking attackers to ensure comprehensive coverage.

Root-cause analysis

Every detected vulnerability includes detailed explanations of the attack methodology and severity scoring. Security teams can quickly identify which vulnerabilities pose the highest risk and understand exactly how each attack succeeded, enabling them to prioritize their security efforts and validate fixes against the same attack patterns.

Continuous Red Teaming

Detected vulnerabilities automatically convert into reusable tests for continuous validation and integration into golden datasets.

Getting started

To start using Giskard’s LLM vulnerability scanner you can follow these steps:

Configure vulnerability scope: Select specific vulnerability categories relevant to your use case, covering comprehensive LLM security areas from prompt injection to business logic failures.
Execute the scan: The system runs hundreds of probes (structured adversarial tests designed to expose weaknesses through harmful content generation attempts, data leakage exploration, and unauthorized tool execution testing).
Analyze results by severity: Results are organized by criticality, so it’s easier for review and fixing first what’s critical for your use case.
Review individual probes: Each probe provides detailed attack descriptions, success/failure analysis, and explanations for why specific vulnerabilities occurred, enabling targeted fixes.
Turn into continuous tests (optional): Successful probes can convert into tests for continuous validation, ensuring remediation efforts remain effective over time.

‍

To discover all features and capabilities, visit our documentation for detailed implementation guides and vulnerability coverage.

Conclusion

This release brings new capabilities to Giskard’s LLM vulnerability scanner, you'll now be able to detect sophisticated attacks that evolve across multiple conversation turns. The scanner will automatically generate attacks, analyze your system's responses, then modify their approach and help you correct the agents with re-executable tests.

Ready to secure your AI agents? We're offering free access to a limited number of companies this month. Request your trial to experience advanced LLM security testing that adapts to your specific environment and threat landscape.

[Release notes]: New LLM vulnerability scanner for dynamic & multi-turn Red Teaming