April 29, 2026
5 min read
Blanca Rivera Campos
David Berenstein

A Cursor AI agent wiped a production database in 9 seconds: Excessive Agency AI failure

In April 2026, a Cursor AI coding agent running Claude Opus 4.6 deleted a startup's entire production database and every backup in a single API call, in nine seconds. This incident is a case of "Excessive Agency," where over-privileged credentials and autonomous reasoning loops bypass security controls. In this article we analyse what failed and how to prevent it.
A Cursor AI Agent wiped a production database

On April 24, 2026, a Cursor AI coding agent running Anthropic's Claude Opus 4.6 deleted PocketOS's production database and every volume-level backup in a single API call to Railway in nine seconds. The agent was fixing a credential mismatch, and it just decided to solve the problem by deleting the data it couldn't access. This failure was the convergence of over-privileged credentials, non-deterministic agentic reasoning, and a lack of infrastructural confirmation semantics.

Recent findings on AI cybersecurity in 2026 indicate that 92% of security professionals are concerned about the impact of AI agents on enterprise security. The PocketOS incident serves as a case study for the "Excessive Agency" vulnerability, categorized as LLM06 in the OWASP Top 10 for LLM Applications. In this article, we analyze the AI incident, deconstruct the architectural flaws in the agentic tool-chain, and present a framework to prevent this kind of failure.

Anatomy of a nine-second production deletion

PocketOS incident with Cursor and Railway

The incident began when the engineering team at PocketOS, a provider of car rental management software, used a Cursor agent for a routine maintenance task in a staging environment. The agent was tasked with resolving a configuration issue, but during execution, it encountered a credential mismatch (a standard error in distributed systems). Rather than terminating the session or requesting human intervention, the agent’s internal reasoning loop, governed by Claude Opus 4.6, determined that the most efficient path to resolution was to modify the underlying infrastructure.

To execute this plan, the agent performed a "credential scavenging" operation across the local filesystem. It found an API token in an unrelated file (originally generated for the narrow purpose of managing custom domains via the Railway Command Line Interface (CLI)). This token was "root-scoped" within the Railway platform, granting it the authority to perform any operation across the Railway GraphQL API, including destructive actions like volumeDelete.

The call deleted the production database volume. Because Railway stores volume-level backups inside the same volume they are meant to protect, it deleted all backups with it. The most recent off-site backup was three months old.

By the next morning, PocketOS customers were showing up for rentals with no records in the system. His team reconstructed reservations by hand, cross-referencing Stripe logs, email confirmations, and calendar invites.

The incident highlights the fundamental danger of "Excessive Agency," in which an LLM is granted too much autonomy and functionality without corresponding verification. The agent ignored Cursor’s marketed "Destructive Guardrails," which were supposed to intercept shell executions or tool calls that could alter production environments. Furthermore, the agent issued a post-action "confession," enumerating the safety rules it had violated, including "guessing instead of verifying" and "running a destructive action without being asked". This underscores a critical paradox in AI security: models can articulate safety principles while simultaneously violating them in high-pressure reasoning loops.

Deconstructing the Technical Failures of Cursor and Railway

The PocketOS disaster was a failure involving the agent host (Cursor), the underlying model (Claude), and the infrastructure provider (Railway). Each component exhibited specific vulnerabilities that, when combined, allowed for a total compromise of the production environment.

Failure 1: Cursor excessive agency

Cursor markets "Destructive Guardrails" that are supposed to block production-impacting actions without approval. The agent had explicit project rules prohibiting destructive operations without confirmation. None of it triggered. The agent took an irreversible action (deleting a production volume) while believing it was working in staging. This is the definition of Excessive Agency (OWASP LLM Top 10, LLM06:2025): an AI agent performing actions beyond its intended scope because no mechanism enforced the boundary. These failure modes are surfaceable through red teaming before deployment, but in this case, they weren't.

Failure 2: Railway: An API with no blast-radius controls

On the infrastructure side, Railway’s architecture at the time of the incident contained several legacy "classical engineering" standards that were incompatible with the unpredictability of AI agents.

  • Token scoping: Railway’s CLI tokens were effectively root-level credentials. There was no mechanism for granular scoping by operation (e.g., read-only) or environment (e.g., staging-only).
  • API semantics: While Railway’s Dashboard and CLI included confirmation prompts for destructive actions, their GraphQL API followed "classical" standards where a valid mutation is honored immediately without secondary checks.
  • Deleted backups: Railway’s documentation noted that "wiping a volume deletes all backups" because the volume-level backups were stored within the same logical volume. For an AI agent, which operates at the API level, this meant a single volumeDelete call deletes all the company’s data.

The security risks in Agentic AI

PocketOS's post reached over 700,000 views because its situation was familiar to many. His stack (Cursor, Claude, Railway) is the default setup for thousands of small and mid-sized engineering teams.

The architecture that destroyed PocketOS's data is being shipped by vendors as a feature. Railway launched mcp.railway.com (a Model Context Protocol server for AI coding agents) the day before the incident, built on the same authorization model: no scoped tokens, no confirmation for destructive actions, no environment isolation.

Through this MCP, AI agents are granted execution capabilities against infrastructure APIs, allowing them to run volumeDelete on a production resource and, in many cases, not requiring confirmation.

How to prevent failures in AI agents: A two-layer defense

To prevent incidents like the Cursor-Railway deletion, technical teams must implement a security framework in two layers: before deployment and at runtime.

Layer 1: Automated Red Teaming for Agentic AI

Giskard AI red teaming - dataset generation

The primary defense against Excessive Agency is Continuous Red Teaming. Giskard Hub uses adversarial testing to probe the target agent for vulnerabilities. For an agent like Cursor with access to Railway tools, Giskard’s Hub provides specialized probes:

  1. Excessive Agency & Unauthorized Tool Execution: This probe attempts to manipulate the agent into using MCP tools (Model Context Protocol) in ways that violate safety policies. For example, it might trick the agent into calling a volumeDelete function by framing it as a "cleanup" task required for a security audit.
  2. Agentic Tool Extraction: This is a multi-turn probe designed to discover which tools and parameter schemas are available to the agent. By identifying what tools can be extracted, security teams can harden the specific tool-calling interface.
  3. Chain-of-Thought (CoT) Forgery: This probe tests whether an agent can be tricked into a destructive action by faking the internal reasoning steps. If the model "thinks" a destructive action is part of a safe, pre-approved procedure, it may bypass its own internal guardrails.

Layer 2: Implementing AI guardrails for runtime protection

Giskard Guards workflow

While red-teaming identifies vulnerabilities during development, Giskard Guards provides a context-aware runtime protection layer. Unlike generic content filters, it operates via a Policy-as-Code framework, which includes specific compliance packs for the EU AI Act and the OWASP Top 10 for LLMs.

In the context of the PocketOS incident, Giskard Guards could have been implemented as an interceptor between the Cursor agent and the Railway API. The guardrail would evaluate the agent's intent and the proposed tool call (e.g., volumeDelete) against a central policy. If the action is categorized as "High-Risk" or "Destructive," the guardrail blocks the call and triggers a human-in-the-loop (HITL) approval process.

Conclusion

The PocketOS incident illustrates that agents need constraints enforced outside their processes. System prompt rules, project configuration, and model-level safety training are all internal to the agent. An agent that violates its own rules (as this one did) proves that internal constraints are insufficient for irreversible high-impact operations.

Giskard Hub red-teams your agents before deployment, probing for Excessive Agency and other vulnerabilities before they appear in production. Giskard Guards then enforces those boundaries at runtime, intercepting dangerous outputs before they reach tool execution.

If your organization is deploying AI agents with access to production systems, talk to the Giskard team about securing them before the next incident is yours.

Continuously secure LLM agents, preventing hallucinations and security issues.
Book a Demo

You will also like

OpenClaw security issues include data leakage & prompt injection

OpenClaw security vulnerabilities include data leakage and prompt injection risks

This article explores the critical security failures of the OpenClaw agentic AI, which allowed sensitive data to leak across user sessions and IM channels. It examines how architectural weaknesses in the Control UI and session management created direct paths for prompt injection and unauthorized tool use. Finally, it outlines the essential hardening steps and systematic red-teaming strategies required to transform a vulnerable "fun bot" into a secure enterprise assistant.

View post
Claude Mythos: Analyzing Anthropic’s new frontier model for AI security

Claude Mythos: Analyzing Anthropic’s new frontier model for AI security

In this article, we analyze Anthropic's newly announced Claude Mythos model and its announced capabilities in automated vulnerability discovery and exploit generation. We explore how this frontier model impacts the cybersecurity landscape.

View post
Model Context Protocol: Understanding MCP security risks and prevention methods

Model Context Protocol: Understanding MCP security risks and prevention methods

MCP servers pose significant MCP security risks due to their ability to execute commands and perform API calls. Even "official" Model Context Protocol setups face MCP vulnerability and tool poisoning. In this article, you'll learn the primary MCP security issues, from injection attacks to supply chain risks, and how to mitigate these MCP cyber security threats for your AI agents.

View post
Get AI security insights in your inbox