In September 2025, someone published an MCP server called postmark-mcp on npm. It looked legitimate. It had 15 clean versions. On version 1.0.16, a one-line backdoor was added: every outgoing email got a hidden BCC to an attacker's address. An estimated 300 organizations were affected. Thousands of emails containing secrets, tokens, and customer data were silently stolen every day.
This was the first confirmed malicious MCP server found in the wild. It will not be the last.
Key Takeaways
- 86% of organizations experienced at least one AI-related security incident in the past year
- Prompt injection remains OWASP's #1 vulnerability for LLM applications, with sophisticated attackers bypassing defenses ~50% of the time
- The OpenClaw crisis saw 1 in 5 marketplace skills confirmed as malware within weeks of going viral
- The "Lethal Trifecta" --- private data access + untrusted content + exfiltration vector --- is the pattern behind most AI agent exploits
- Curated, verified skill registries are emerging as the supply chain defense layer the ecosystem needs
Short Answer
What's happening? AI agents are being deployed faster than they're being secured. Prompt injection, supply chain attacks on agent skills, and data exfiltration through tool access are all real, documented threats. Only 4% of organizations have achieved "Mature" cybersecurity readiness, according to Cisco's 2025 Cybersecurity Readiness Index. The gap between AI adoption speed and security readiness is the crisis.
The Numbers Are Worse Than You Think
Let's start with what we know:
| Stat | Source |
|---|---|
| 86% of organizations had at least one AI security incident in the past year | Cisco 2025 Cybersecurity Readiness Index |
| Only 4% of organizations have "Mature" cybersecurity readiness | Cisco |
| Only 31% can adequately secure agent AI systems | Cisco |
| 95 AI-related CVEs filed in 2025 alone (near zero before 2025) | DataBahn |
| 1 in 8 enterprise security incidents now involves an agentic system | Digital Applied |
| 48% of cybersecurity pros identify agentic AI as the most dangerous attack vector | Dark Reading / Bessemer |
| $4.63M average cost per shadow AI breach ($670K more than standard breaches) | IBM 2025 Cost of a Data Breach Report |
These are not projections. These are things that already happened.
40% of enterprise applications will feature task-specific AI agents by 2026, up from less than 5% in 2025 (Gartner). The attack surface is expanding faster than defenses can keep up.
Prompt Injection: The SQL Injection of AI
Prompt injection has been OWASP's #1 vulnerability for LLM applications for two consecutive years. It is the most fundamental security flaw in how we build AI systems today.
The attack is simple in concept: an attacker embeds instructions in content that the AI processes --- an email, a document, a web page, a tool description --- and the AI follows those instructions instead of the user's intent.
How Bad Is It?
The International AI Safety Report 2026 found that sophisticated attackers bypass the best-defended models approximately 50% of the time with just 10 attempts. Research shows 89% success rates against GPT-4o and 78% against Claude 3.5 Sonnet with sufficient attempts.
Anthropic has made real progress --- Claude Opus 4.5 reduced successful prompt injection to 1% in browser-based operations. But Anthropic also dropped its direct injection metric in February 2026, arguing that indirect injection (via external content) is the more relevant enterprise threat.
Real-World Incidents
These are not theoretical:
EchoLeak (CVE-2025-32711, CVSS 9.3) --- The first real-world zero-click prompt injection exploit. An attacker sends a crafted email. Microsoft 365 Copilot ingests it automatically. Hidden instructions cause Copilot to extract data from OneDrive, SharePoint, and Teams and send it to the attacker via trusted Microsoft domains. No user interaction required. Source
Johann Rehberger's "Month of AI Bugs" --- In August 2025, security researcher Johann Rehberger published one critical AI vulnerability disclosure per day. Targets included ChatGPT, Cursor, Claude Code, GitHub Copilot, Google Jules, and Devin. He spent $500 of his own money testing Devin AI and found it "completely defenseless" --- manipulable into exposing ports, leaking tokens, and installing backdoors. Source
The Lethal Trifecta
Simon Willison, creator of Datasette and Django co-creator, coined this term on June 16, 2025. It describes the three conditions that, when combined, make any AI agent exploitable:
- Access to private data --- emails, documents, databases, code repos
- Exposure to untrusted content --- external sources like emails, shared docs, web pages
- Exfiltration vector --- ability to make external requests (render images, call APIs, generate links)
"Any time you combine access to private data with exposure to malicious tokens and an exfiltration vector you're going to see the same exact security issue." --- Simon Willison, The Lethal Trifecta
When an AI agent has all three, an attacker can craft content that tricks the agent into reading private data and sending it somewhere the attacker controls.
The only safe approach: ensure your agent never has all three simultaneously. If you cannot avoid it, remove at least one leg. Limit data access, sandbox untrusted content, or block external requests.
Real examples of the trifecta in action:
- GitHub MCP server: Attackers submitted issues with prompt injection that exfiltrated private repository data via pull requests
- Writer.com: External URLs injected prompts that exfiltrated private document info via invisible image URLs
- GitLab Duo Chatbot: Public projects with rogue instructions sent private repo data to fake security domains
Supply Chain Attacks: The npm Left-Pad Moment for AI
The AI agent ecosystem is learning the same painful lessons that the npm ecosystem learned a decade ago --- but with higher stakes, because agent skills can influence AI behavior, not just execute code.
The OpenClaw Crisis
OpenClaw, an open-source AI agent with 135,000+ GitHub stars, went from viral sensation to security crisis in three weeks:
- Late January 2026: Gained 25,000 stars in a single day
- Mid-February 2026: Researchers discovered a critical one-click RCE vulnerability, a poisoned marketplace, and 135,000+ exposed instances with no authentication
- 1,184 confirmed malicious skills across 10,700+ total packages. Roughly 1 in 5 were malware. Snyk named the campaign "ClawHavoc"
- Malicious skills were designed to exfiltrate credentials, install cryptominers, create SSH backdoors, and persist across reinstalls
- 135,000+ publicly exposed instances across 82 countries. 50,000+ exploitable via remote code execution
Sources: Admin By Request, Reco.ai
The Postmark MCP Attack
The attack described at the top of this article was methodical:
- Publish 15 clean versions to build trust and download history
- On version 1.0.16, inject a one-line backdoor
- Every outgoing email gets a hidden BCC to the attacker's address
- Estimated 3,000-15,000 emails stolen per day containing secrets, tokens, and PII
This is the "left-pad" moment --- but instead of breaking builds, the attack silently exfiltrated sensitive data. Source
MCP Server Exposure at Scale
Among 7,000+ MCP servers analyzed, 36.7% were vulnerable to SSRF (Server-Side Request Forgery). Over 8,000 MCP servers are visible on the public internet with exposed admin panels, debug endpoints, and unauthenticated API routes. In the first 60 days of 2026, 30+ CVEs were filed against MCP infrastructure, including a CVSS 9.6 RCE flaw in a package with nearly 500,000 downloads. Source
Vibe Coding's Hidden Cost
AI-assisted coding has exploded. But speed comes with a price that most developers are not accounting for.
Veracode's 2025 GenAI Code Security Report tested 80 coding tasks across 4 languages and 4 vulnerability types. The findings:
- 45% of AI-generated code contains security flaws
- Java had a 72% failure rate. Python, C#, and JavaScript: 38-45%
- 86% of samples failed to defend against XSS. 88% were vulnerable to log injection
- AI-generated code produces 1.57x more security findings than human-written code
- Security pass rates remained flat at 45-55% regardless of model generation --- even as syntax correctness climbed from 50% to 95%
That last point is critical. Models got dramatically better at writing code that works. They did not get better at writing code that's secure. Source
By March 2026, 35 CVEs were directly attributed to AI-generated code in a single month, up from 6 in January. Georgia Tech's "Vibe Security Radar" project tracks this trend. AI coding agent CVEs have been filed against Claude Code, Cursor, Anthropic MCP servers, and others.
What's Being Done About It
The situation is not hopeless. Here's what the industry is building:
Anthropic has developed multi-layered prompt injection defenses using targeted reinforcement learning. Claude Opus 4.5 achieved 1% injection success in browser operations. They also run expert human red-teaming and external challenge programs.
Microsoft published a comprehensive agent identity framework (March 2026) where each agent gets its own identity, Conditional Access policies, and human sponsors governing its lifecycle.
The RSAC 2026 consensus: Agentic AI governance is urgent and unsolved. Agent-focused security announcements came from Cisco, CrowdStrike, Palo Alto, BeyondTrust, Wiz, and dozens of other companies.
The emerging defense stack looks like this:
| Layer | What It Does | Examples |
|---|---|---|
| Identity | Each agent gets a verified identity with scoped permissions | Microsoft Agent Identity, OAuth for agents |
| Guardrails | Runtime rules that limit what agents can do | Input/output filtering, action-level constraints |
| Sandboxing | Isolated execution environments for untrusted operations | Container-based execution, network isolation |
| Verification | Trust signals for skills, plugins, and MCP servers | Publisher verification, content hashing, safety scanning |
| Monitoring | Observability for agent behavior in production | Action logging, anomaly detection, audit trails |
The Skills Marketplace Defense Layer
The OpenClaw crisis proved what happens without curation: 1 in 5 packages become malware. The Postmark MCP attack proved that even established registries like npm are vulnerable when there's no verification layer for AI-specific packages.
The parallel to npm's evolution is direct. npm went from "anyone can publish anything" to verified publishers, provenance attestations, and security advisories. The AI skills ecosystem is going through the same growing pains --- but at 10x speed because the attack surface is fundamentally larger.
What a proper skills registry needs:
- Content hashing (SHA-256) for integrity verification --- detect if a skill has been tampered with
- Safety scanning --- automated detection of prompt injection patterns and data exfiltration attempts
- Publisher trust tiers --- earned reputation that distinguishes new publishers from established ones
- File type restrictions --- text and images only, executables and archives blocked
- Similarity detection --- catch copycat skills with hidden modifications
This is not theoretical. These are the exact defenses that registries like OpenBooklet are building right now - content hashing, automated safety scans, and publisher trust tiers applied to every package before it lands in search results. The question is whether adoption will outpace the attackers.
What You Should Do Today
If you're building with AI agents, here's a practical checklist:
- Audit your agent's Lethal Trifecta exposure --- Does it have private data access + untrusted content + exfiltration capability? If yes, remove at least one leg.
- Review every MCP server and skill you've installed --- Check the source, the publisher, and when it was last updated. If you can't verify it, remove it.
- Never auto-approve tool calls in production --- Human-in-the-loop for destructive or external actions is not optional.
- Run security scans on AI-generated code --- The code compiles and passes tests. That does not mean it's secure. Add SAST/DAST to your CI pipeline.
- Prefer verified skill sources --- Use registries with publisher verification, safety scanning, and trust signals over raw npm packages or unknown GitHub repos.
FAQ
Is prompt injection actually exploitable in production?
Yes. EchoLeak (CVE-2025-32711) was a zero-click exploit against Microsoft 365 Copilot in production. Johann Rehberger demonstrated exploitable vulnerabilities in ChatGPT, Cursor, Claude Code, GitHub Copilot, and Devin. These are not proof-of-concept demos --- they are real attacks against real products used by millions.
Are MCP servers safe to use?
It depends on the server. MCP itself is a protocol, not a security guarantee. Official reference servers from the MCP organization are generally well-maintained. Community servers vary widely. Always check the source code, publisher reputation, and permissions before installing.
How do I know if an AI agent skill is safe?
Look for registries that provide content hashing, safety scanning, publisher verification, and trust tiers. Avoid installing skills from unverified sources. If the skill is closed-source and you cannot inspect it, treat it with the same caution you would treat any untrusted executable.
Will prompt injection ever be fully solved?
Probably not in the way SQL injection was solved (parameterized queries). The fundamental challenge --- that AI models process instructions and data in the same channel --- does not have a clean architectural fix. Defense will remain layered: better models, runtime guardrails, sandboxing, and monitoring working together.
Key Takeaways
- The crisis is real and documented --- 86% incident rate, 95 CVEs in 2025, 1 in 8 enterprise incidents involving agents
- Prompt injection is the SQL injection of AI --- OWASP #1 for two years, ~50% bypass rate against best defenses
- Supply chain attacks have arrived --- The Postmark MCP attack and OpenClaw crisis are not hypotheticals, they are case studies
- The Lethal Trifecta is your threat model --- private data + untrusted content + exfiltration vector = exploitable agent
- Curated registries are the defense layer --- verified publishers, safety scanning, and trust tiers are how the ecosystem matures
Further reading: MCP Explained: The USB-C of AI Agents | What Are AI Agent Skills?