Model Context Protocol: Understanding MCP security risks and prevention methods

What is MCP?

The Model Context Protocol (MCP) is an open standard introduced by Anthropic in late 2024. The protocol defines how AI applications, especially large language models (MCP LLMs), connect to external tools, data sources, and services in a consistent way.

Instead of custom integrations for each model and system, MCP provides a universal interface for reading files, executing functions, and sharing contextual information with LLMs. In the pre-MCP era, a fixed list of tools for LLMs was provided in the system prompt.

In the MCP protocol ecosystem, three roles collaborate:

Hosts: Applications like Claude Desktop or IDEs that embed LLMs.
Clients: Connectors inside those hosts.
MCP servers: Services exposing tools and data to the model.

The protocol supports bidirectional communication so that models can invoke tools, retrieve real-time data, and act on enterprise systems through standardized requests and responses from the MCP servers.

Why MCP Security is Critical for AI Agents

Why is it important to consider MCP security? MCP extends an LLM from "just a chat interface" into an active agent plugged into your software systems. This brings powerful capabilities but also introduces a new, rapidly evolving attack surface.

Every MCP server is effectively a remote extension that can read data, call APIs, and in some cases execute code with the same privileges as the host environment. In addition, because MCP is designed for ease of integration and rapid experimentation, teams often connect third-party or community MCP servers as they would browser extensions, without systematic security review.

In regulated or data-sensitive environments, a single misconfigured or malicious server can undermine otherwise robust MCP cyber security controls and governance.

MCP architecture and attack flows

To understand the MCP security risks, we must look at the architecture. The core components of an MCP-enabled system function as follows:

The MCP host (e.g., desktop app, internal AI assistant, IDE plugin) embeds an LLM and an MCP client.
The MCP client discovers and connects to one or more MCP servers configured by the user or the platform.
Each MCP server exposes tools (functions), resources (documents, databases), or prompts that the LLM can call through standardized MCP messages.

Model Content Protocol (MCP) execution flow

The execution flow creates distinct opportunities for exploitation:

The LLM receives a user query and decides to call one or more tools by sending back a tool call message.
The host parses this message to get the tool name and arguments.
The host sends the tool call names and arguments over the MCP to the corresponding servers.
Responses from the MCP servers are sent back into the host.
Finally, responses are sent to the LLM to generate final outputs or trigger follow-up tool calls.

Common MCP security risks and vulnerabilities

From an attacker's perspective, there are several typical MCP security issues:

Malicious or compromised MCP servers: Community servers can ship arbitrary code that runs with user or service account privileges, enabling credential theft, data exfiltration, or lateral movement.
Plaintext secrets and configuration leakage: Many MCP implementations load API keys and credentials from local configuration files in plaintext, which malicious servers can read and exfiltrate in a single call.
Prompt injection leading to unsafe tool use: Content from documents or websites can include crafted instructions that trick the model into invoking powerful MCP tools in ways not anticipated by designers.
Over-privileged tools and missing guardrails: MCP servers often expose broad filesystem, database, or admin APIs without granular scoping, increasing the radius of an LLM-driven mistake or compromise.

MCP cyber security attack scenarios examples

Scenario 1: Malicious server on install

An attacker publishes or distributes a useful MCP server (e.g., "productivity" or "CRM" integration). A user or admin installs it on the MCP host. When the host loads the server, its code executes with full access to local environment variables, configuration files, and the network, allowing immediate credential theft or data exfiltration without visible user interaction.

Scenario 2: Prompt injection to abuse a benign server

The agent uses an MCP web or document connector to fetch external content. A compromised page or file contains hidden instructions designed to manipulate the LLM into calling another sensitive MCP tool (filesystem, ticketing system, admin API, etc.). The LLM follows these instructions and performs unauthorized operations, such as downloading sensitive files or triggering high-risk changes.

Scenario 3: Exploiting a vulnerability in a trusted MCP server

Organizations often implicitly trust "official" MCP servers, such as those used for connecting to internal databases. However, if such a server harbors an undisclosed vulnerability or injection flaw, that trust can be weaponized.

In this scenario, an attacker uses crafted inputs or model-generated payloads to target the specific vulnerability. Successful exploitation allows the attacker to gain arbitrary code execution within the MCP host environment, turning a trusted system component into a foothold for further compromise

Case study: Attacking an MCP LLM assistant

Imagine a sales assistant deployed across your organization, accessible via chat. It uses MCP servers to connect to your CRM (read/write customer records), cloud file storage (read access to proposals), and an internal "helper" plugin.

The attack:

An attacker convinces a salesperson to install a community "email summarizer" MCP server promising better prospecting insights.
The server's code reads the local configuration file used by the assistant, which contains CRM API keys and OAuth refresh tokens in plaintext.
It sends those secrets to a remote server controlled by the attacker.
The attacker impersonates the assistant's backend service, queries the CRM at scale, and pulls high-value customer lists.
They also use the stolen tokens to request access to file storage via the same MCP integration, exfiltrating commercial proposals and confidential pricing models.

The consequences:

Data confidentiality breaches: Unauthorized exposure of customer PII, contracts, and pricing strategies.
Business and competitive damage: Competitors can use stolen data to undercut deals or engineer spear-phishing campaigns.
Integrity and process risk: Compromised MCP servers can tamper with records or inject incorrect data into CRM systems.
Reputational and legal fallout: Public disclosure of an AI-mediated breach can damage trust and trigger regulatory investigations.

How to detect MCP security issues and prevent attacks

To secure the MCP protocol ecosystem, organizations must focus on configuration and monitoring.

Configuration best practices:

Principle of least privilege: Scope MCP tools narrowly (e.g., read-only, restricted paths) and separate sensitive capabilities into distinct, well-audited servers.
Secure secret management: Avoid plaintext configuration files; use OS-level secret stores or vaults, and ensure servers only receive the minimum credentials required.
Code and supply-chain review: Treat community MCP servers like third-party software; require provenance, scanning, and approval before deployment.

Monitoring MCP traffic:

Centralized logging: Capture which MCP tools are called, with what parameters, and link them to conversations, users, and environments.
Anomaly detection: Flag unusual patterns, such as large sequential file reads, multiple failed tool invocations, or access to rarely used APIs.

Validating MCP server security with automatic Red Teaming

In Giskard Hub, you do not need to write manual test cases for every possible attack. Instead, you launch an automated Vulnerability Scan.

Connect your agent: You wrap your MCP-enabled agent (the "Host") so Giskard can interact with it. You must provide a name and description, which our scanner uses to generate domain-specific attacks relevant to your business (e.g., if you describe a "CRM Agent," we generate CRM-specific attacks).
Launch adversarial probes: The scanner launches a suite of "adversarial agents." These are specialized LLMs tasked with breaking your system. They engage in dynamic and multi-turn conversations, adapting their tactics based on your agent's responses to find a successful exploit.
Review & remediate: The Hub provides a detailed report categorizing vulnerabilities by risk (e.g., OWASP Top 10), complete with the exact conversation logs that triggered the failure, allowing you to patch the specific MCP server or system prompt.

Specific probes for MCP and agents

For MCP security, we utilize specific probes designed to target the unique risks of tool-use and server connections:

Agentic Tool Extraction: This is a multi-turn reconnaissance probe. The attacker gradually converses with your agent to discover which MCP tools are available, their parameter schemas, and their capabilities.
Excessive Agency & Unauthorized Tool Execution: These probes attempt to manipulate the agent into using MCP tools in ways that violate safety policies.
- Example: Tricking a customer support agent into using an internal get_user_billing_info tool for a user who isn't authenticated.
Prompt Injection (ChatInject & CoT Forgery): Standard injection attacks often fail on complex agents. We use advanced techniques like ChatInject, which wraps attack payloads in forged chat template tokens (mimicking system or user roles), and Chain-of-Thought (CoT) Forgery, which fakes reasoning steps to trick the model into following a harmful path.
Internal Information Exposure: This probe specifically targets the leakage of configuration details—such as the plaintext API keys or environment variables often found in MCP implementations.
Cross-Session Leakage: This tests if an MCP server improperly retains state. The probe provides sensitive data in one conversation turn and attempts to retrieve it in a completely separate session, simulating a privacy breach between different users.

By integrating these scans into your CI/CD pipeline, you establish Continuous Red Teaming, ensuring that every update to your MCP tools or agent prompts is validated against the latest attack vectors before deployment.

Conclusion: Securing the MCP protocol ecosystem

MCP enables LLMs to act on your data, tools, and systems at scale. While this unlocks significant business value, it also creates a new class of supply-chain and runtime risks that traditional security controls were not designed to handle.

Organizations that rely on MCP-enabled agents need both infrastructure-level hardening and specialized AI security testing that understands how models, prompts, and tools interact in real conversations. By combining MCP-aware red teaming, fine-grained observability, and continuous validation, you can confidently deploy AI agents that are not only powerful but also trustworthy and compliant with your security posture.

Ready to test your MCP Security? Contact the Giskard team.

Model Context Protocol: Understanding MCP security risks and prevention methods

What is MCP?

Why MCP Security is Critical for AI Agents

MCP architecture and attack flows

Common MCP security risks and vulnerabilities

MCP cyber security attack scenarios examples

Case study: Attacking an MCP LLM assistant

How to detect MCP security issues and prevent attacks

Validating MCP server security with automatic Red Teaming

Specific probes for MCP and agents

Conclusion: Securing the MCP protocol ecosystem

You will also like

Agentic tool extraction: Multi-turn attack that exposes the agent's internal functions

Cross Session Leak: when your AI assistant becomes a data breach

Understanding single-turn, multi-turn, and dynamic agentic attacks in AI red teaming

Get AI security insights in your inbox