Codex, Claude Code, and Copilot Were All Breached — Attackers Targeted Credentials, Not AI Models

Posted on April 30, 2026April 30, 2026 by Mark Harrell

Contents show

Codex, Claude Code, and CopilotWere All Breached — Attackers Targeted Credentials, Not AI Models

By the time most developers had finished their morning coffee on March 30, 2026, a security researcher had already stolen a GitHub OAuth token from OpenAI's Codex — using nothing more than a branch name.

That is not a typo. A branch name. The kind of thing you type in a terminal and forget about five seconds later.

Two days after that, Anthropic's Claude Code had its own source code accidentally leak onto the public npm registry. And while engineers were scrambling to respond, another research team quietly proved that Claude Code would abandon its own security rules entirely once a single command exceeded 50 sub-steps.

These were not random one-off accidents. They were the latest entries in a nine-month streak of attacks that hit every major AI coding tool on the market: Codex, Claude Code, GitHub Copilot, and Google's Vertex AI. Six separate research teams participated. Every single exploit followed the exact same playbook: find the credential the AI agent is holding, steal it, and walk straight through the front door of a production system.

Nobody tried to hack the AI model itself. Nobody tried to confuse the neural network or feed it bad training data. The attackers went for something older and more reliable. They went for the keys.

Why Credentials Are the Real Target

Here is something worth understanding before we get into the specifics.

When you use an AI coding tool like Copilot or Claude Code, you are not just getting autocomplete suggestions. These tools have actual access to your code repositories, your file system, your deployment pipelines, and sometimes your cloud storage. To do all that, they need credentials. OAuth tokens, service account keys, API secrets.

Those credentials are what make AI agents useful. But they are also exactly what makes them dangerous.

Merritt Baer, the Chief Security Officer at Enkrypt AI and a former Deputy CISO at Amazon Web Services, put it very plainly when asked about this problem. She said enterprises believe they have approved an AI vendor, but what they have actually approved is an interface. The credentials underneath that interface are a completely separate story. The breach lives there, not in the chatbot UI you are looking at.

Most companies think approving a tool means they have assessed the risk. What they have actually done is hand a contractor a master key and forgotten to ask what doors it opens.

The Attack That Started With a Branch Name

Let us start with the Codex breach because it is probably the most surprising.

BeyondTrust researchers Tyler Jespersen, Fletcher Davis, and Simon Stewart discovered that when Codex cloned a repository, it embedded a GitHub OAuth token directly into the git remote URL. That is already a risky design. But the real problem was what happened next.

The branch name you specify during cloning flowed directly, completely unsanitized, into a setup script that the system then executed. No filtering, no escaping, no validation. The branch name went straight into the shell.

So what did the researchers do? They added a semicolon and a backtick subshell to the branch name. That tiny bit of shell syntax turned an innocent-looking branch parameter into an active command that reached out to an external server and sent the OAuth token along in cleartext.

If you are not familiar with security: cleartext means readable. Not encrypted. Not hidden. Just raw text flying across a network, readable by anyone positioned to intercept it.

But the cleverness did not stop there. Simon Stewart added one more detail that made this attack genuinely scary. He appended 94 Ideographic Space characters to the branch name, after the word “main.” Ideographic Spaces are a Unicode character, code point U+3000, that look identical to a normal space but are not the same character that the shell is reading. The result was that in the Codex web portal, the branch name appeared to say “main.” In the shell, it said something that exfiltrated your token.

A developer would look at the screen, see “main,” and think everything was normal. The attack was already complete.

OpenAI classified this as Critical P1 and shipped a full fix by February 5, 2026.

Claude Code Had Not One, Not Two, But Three Separate Problems

If the Codex story surprised you, Claude Code's run of vulnerabilities in the same period is the kind of thing that should make any security team rethink their trust assumptions about AI tools.

CVE-2026-25723: Escaping the Sandbox

The first vulnerability hit Claude Code's file-write restrictions. Claude Code is supposed to operate inside a project sandbox, only reading and writing files within a defined workspace. CVE-2026-25723 broke that guarantee.

The problem was that piped commands using sed and echo were not being validated for chaining. If you could get Claude Code to run a command that piped output between shell utilities, you could escape the project directory entirely. The sandbox existed on paper. In practice, it had a hole big enough to walk through.

Anthropic patched it in version 2.0.55.

CVE-2026-33068: The Trust Dialog That Never Appeared

The second vulnerability was subtler and, frankly, more alarming.

When you open a new workspace in Claude Code, it shows you a trust dialog. Essentially it asks whether you trust this repository before giving it elevated permissions. That dialog is a meaningful safety mechanism. If you are opening a repository from an unknown source, that prompt is supposed to give you a chance to say no.

CVE-2026-33068 bypassed that prompt entirely.

Claude Code was loading permission settings from a file called .claude/settings.json before displaying the trust dialog. A malicious repository could include its own version of that settings file with a single line: set permissions.defaultMode to bypassPermissions. With that line in place, the trust dialog never appeared. The repo granted itself whatever permissions it wanted before you even had a chance to decline.

This was patched in version 2.1.53.

The 50-Subcommand Rule: Security Traded for Speed

The third issue, discovered by the research team at Adversa, did not get a CVE number but might be the most telling of the three.

Claude Code enforced a set of deny rules — commands and patterns it was explicitly forbidden from executing. These rules existed to keep the agent inside safe boundaries. But Adversa found that once a single command exceeded 50 subcommands, Claude Code silently stopped checking the deny rules.

Anthropic's engineers had apparently made a performance decision at some point: checking every subcommand past the fiftieth was taking too long, so the checking stopped. The deny rules became effectively decorative for any sufficiently complex command chain.

Adversa was able to get Claude Code to execute commands it should have refused, simply by building a command that crossed that 50-subcommand threshold first.

Patched in version 2.1.90.

Carter Rees, VP of AI and Machine Learning at Reputation and a member of the Utah AI Commission, described the root cause clearly. He wrote that a significant vulnerability in enterprise AI is broken access control, where the flat authorization model of a large language model fails to respect user permissions. The repository got to decide what permissions the agent had. The token budget decided which deny rules survived.

GitHub Copilot: When PR Descriptions Become Attack Vectors

If you thought pull request descriptions were boring text boxes, think again.

Johann Rehberger demonstrated CVE-2025-53773 against GitHub Copilot, with Markus Vervier of Persistent Security as a co-discoverer. The exploit worked like this: an attacker could embed hidden instructions inside a pull request description. When Copilot read that PR as part of its context, those hidden instructions triggered it to flip a specific setting in .vscode/settings.json.

That setting controlled auto-approve mode. Flipping it disabled all user confirmations and granted unrestricted shell execution across Windows, macOS, and Linux simultaneously. One hidden instruction in a text box. Full shell access.

Microsoft patched this in the August 2025 Patch Tuesday release.

Then Orca Security found another way in, this time through GitHub Issues and Codespaces.

Hidden instructions embedded in a GitHub issue could manipulate Copilot into checking out a malicious pull request. That PR contained a symbolic link, a file that pointed to another file, specifically to a location at /workspaces/.codespaces/shared/user-secrets-envs.json. A crafted URL in a JSON schema field then exfiltrated the privileged GITHUB_TOKEN stored at that location.

The result was complete repository takeover. No user needed to click anything beyond opening the issue, which is something developers do dozens of times a day.

Mike Riemer, CTO at Ivanti, made an observation about the speed problem that makes all of this even more concerning. He pointed out that threat actors are reverse engineering security patches within 72 hours of release. If a team does not apply a patch within that window, they are exposed. But AI agents compress that window even further. The time between “vulnerability exists” and “credential stolen” can be measured in seconds when an agent is handling the interaction without a human in the loop.

Vertex AI: When Default Settings Grant Too Much

The Vertex AI situation is arguably the most structural problem of the four.

Unit 42 researcher Ofir Shaty found that the default Google service identity attached to every Vertex AI agent came loaded with excessive permissions right out of the box. This is the kind of problem that does not require a crafty attack chain. It just requires being the first person to notice what the default configuration actually allows.

The stolen P4SA credentials granted unrestricted read access to every Cloud Storage bucket in the entire project. That alone would be serious enough. But the reach went further: those same credentials could access restricted, Google-owned Artifact Registry repositories at the core of Vertex AI's Reasoning Engine infrastructure.

Shaty described the compromised P4SA as functioning like a double agent. It had access to both user data and Google's own internal infrastructure simultaneously. One set of stolen credentials, two different attack surfaces.

The root cause was that the OAuth scopes on the default service account were not editable by default. Least privilege was not just violated, it was violated by design. The configuration that shipped to every customer was the configuration that created the problem.

What Made These Attacks So Hard to Catch

Before we talk about what to do, it is worth understanding why these attacks slipped through in the first place, because it is not just negligence.

AI coding agents operate in a fundamentally different way from traditional software tools. A text editor does not authenticate to your GitHub account on its own. A code compiler does not call external APIs. These tools were passive. You ran them. They did their job. They stopped.

AI coding agents are different. They are active participants in your development environment. They clone repositories, read files, write files, push commits, call APIs, and sometimes spin up cloud resources. To do all of that, they need real, functioning credentials with real, functioning access. And unlike a human developer who pauses before doing something unusual, an agent just executes.

The speed gap is part of what makes this so hard to defend against. When a human developer receives a pull request with suspicious content, there is a chance they will notice something feels off. An AI agent reading that same PR description as context processes it in milliseconds and acts on whatever instructions it finds there, including instructions that were never supposed to be instructions.

This is what security researchers call prompt injection. The term comes from SQL injection, an older attack technique where malicious input is treated as a command rather than data. Prompt injection does the same thing to an AI: malicious text embedded in content the agent processes gets interpreted as instructions rather than as plain text to be read.

The Copilot exploits were both prompt injection attacks. The branch name exploit against Codex was closer to classic command injection. But the underlying idea is the same: the agent treated hostile input as trusted instruction, and no validation step caught the difference.

Layered on top of that is the visibility problem. Most security operations centers have logging and monitoring for human user activity. They track what accounts log in, what files get accessed, what APIs get called. They have playbooks for what suspicious human behavior looks like.

AI agents do not behave like humans. They generate a very different pattern of API calls, repository accesses, and network requests. Most existing security tooling was not built to baseline what normal looks like for an AI coding agent, which means anomalies are not generating alerts the way they would for a compromised human account.

This is the infrastructure gap underneath the credential gap. Even if a security team knows their AI agents hold powerful credentials, they may not have the tooling to detect when those credentials are being misused.

The Broader Stakes: When AI Agents Touch Your Supply Chain

One detail in the Vertex AI disclosure deserves more attention than it has received.

The compromised P4SA service account did not just have access to the victim organization's own Cloud Storage. It also reached restricted Artifact Registry repositories that are part of Google's own Vertex AI infrastructure.

That means a breach of one AI agent credential could potentially be used as a stepping stone into the software supply chain of the AI platform itself. The downstream effects of a compromised Artifact Registry at that level are significant: tampered AI components, poisoned dependencies, or backdoored runtime environments could affect every user of that platform.

Supply chain attacks are already one of the most destructive categories of cybersecurity incident, precisely because a single compromised point upstream can affect thousands of systems downstream. The SolarWinds attack in 2020 is the textbook example. Millions of lines of monitoring software were updated with a backdoor, and because the update came from a trusted vendor, it installed silently across thousands of organizations.

AI coding agents create new potential entry points into that same attack surface. An agent with excessive permissions does not just endanger the organization that deployed it. It potentially endangers every system that agent touches.

This is not a hypothetical concern. The Vertex AI P4SA finding was not a theoretical exposure. It was a demonstrated path to restricted infrastructure that should not have been reachable from a standard customer credential.

Pull all four of these incidents together and the picture is unmistakable.

Every vendor had shipped security defenses. Codex ran tasks in cloud containers and scrubbed tokens during agent runtime. Claude Code sandboxed file writes through its accept-edits mode. Copilot filtered PR descriptions for known injection patterns. Vertex AI used a service agent with OAuth scopes.

Every defense was bypassed. Not because the attackers were impossibly sophisticated. But because in each case, the credential existed and was reachable before validation happened.

The Sonar 2026 State of Code Developer Survey found that 25% of developers now use AI agents regularly, and 64% have started using them in some capacity. Veracode tested more than 100 large language models and found that 45% of the generated code samples introduced vulnerabilities from the OWASP Top 10 list. That is a separate, compounding problem layered on top of the credential gap.

CrowdStrike CTO Elia Zaitsev framed the core rule at RSAC 2026: collapse agent identities back to the human. An agent acting on your behalf should never hold more privileges than you do. Codex held a GitHub OAuth token scoped to every repository the developer had ever authorized. Vertex AI's P4SA read every Cloud Storage bucket in the project. Claude Code traded security enforcement for token budget performance.

Kayne McGladrey, an IEEE Senior Member who advises enterprises on identity risk, made the same diagnosis: the agent uses far more permissions than it should, more than any human would need, because of the speed and scale at which it operates.

Riemer drew the operational line: you do not know something until you validate it. The branch name talked to the shell before validation ran. The GitHub issue talked to Copilot before anyone read it. That ordering problem is the vulnerability. Everything else is symptoms.

What Good Governance Actually Looks Like

The good news, if you want to look for it, is that every one of these vulnerabilities has a known fix. The bad news is that most organizations have not implemented them yet.

Start With a Real Inventory

You cannot protect what you have not listed. Every AI coding agent your team uses needs to be in your configuration management database. That means Codex, Claude Code, Copilot, Cursor, Gemini Code Assist, and Windsurf. All of them.

For each one, you need to know: what credentials does it hold, what OAuth scopes were granted at setup, and who granted them. If your current CMDB does not have a category for AI agent identities, create one. This is not optional.

A Gravitee survey from 2026 found that only 21.9% of teams had enrolled AI agent credentials into a privileged access management system. That means roughly 78% of organizations using these tools have AI agents running with real production credentials that no IAM policy is governing.

Audit OAuth Scopes and Patch Levels Now

For Claude Code: upgrade to version 2.1.90 or later. All three vulnerabilities described above are patched in that version.

For Copilot: verify that the August 2025 Patch Tuesday release is applied in your environment.

For Vertex AI: migrate to the bring-your-own-service-account model. This lets you define your own service account with scoped permissions instead of relying on the default P4SA configuration that ships with excessive access.

For all AI agents: review every OAuth scope that was granted at setup. If the scope list reads like a shopping cart, it needs to be trimmed. Most agents need far less access than what they were given by default.

Treat Agent Input as Adversarial

The Copilot attacks both worked through content that developers encounter every day: pull request descriptions and GitHub issues. The Codex attack worked through a branch name. These are not exotic attack surfaces. They are things your developers touch constantly.

This means your security posture needs to treat all of that content as potentially hostile. Not suspicious of your teammates, but aware that anyone who can open a PR or file an issue can potentially inject instructions into your AI agent.

Specific things to monitor:

Unicode obfuscation using Ideographic Space characters (U+3000), which can hide malicious payloads in plain-looking text
Command chains that exceed 50 subcommands, which historically bypassed Claude Code's deny rules
Changes to .vscode/settings.json or .claude/settings.json that flip permission modes, especially bypassPermissions
Network calls made by agent runtime processes, particularly outbound requests that are not part of normal operation

Apply Privileged Access Management to Agent Credentials

The framework for handling privileged credentials already exists. Privileged Access Management systems and Identity Governance and Administration platforms are mature, well-supported technologies. The gap is that almost nobody has applied them to AI agent credentials yet.

The same principles that apply to human privileged accounts should apply here: credential rotation on a schedule, least-privilege scoping, separation of duties. In the AI agent context, that last one means the agent that writes code should not be the same agent that deploys it. Those are two different functions with two different risk profiles, and they should not share a credential.

CyberArk, Delinea, and other PAM platforms that support non-human identities can onboard agent OAuth credentials today. Most organizations have simply not asked them to.

Validate Before the Agent Acts

The thing all four attacks had in common is that the agent took an action before any validation occurred. The branch name executed before it was checked. The issue content reached Copilot before anyone reviewed it. The P4SA credentials were granted before anyone audited the scope.

Riemer's framing is the right one: you do not trust something until you validate it. Before any AI coding agent authenticates to GitHub, sends a request to Gmail, or writes to an internal repository, verify three things: the agent's identity, the scope of that identity, and the human session it is bound to. If those three checks cannot be completed, the action should not proceed.

Ask Your Vendors the Hard Questions

There is one more thing worth doing, and it costs nothing except the willingness to have an awkward conversation before a contract renewal.

Ask each AI coding tool vendor, in writing, for the following: show me the identity lifecycle management controls for the AI agent running in my environment, including the credential scope, the rotation policy, and the permission audit trail.

If the vendor cannot answer that question, you have your answer. The absence of an answer is itself an audit finding.

The Governance Gap That Still Needs Closing

Here is where things stand right now.

Most security leaders have a complete inventory of every human identity in their organization. They know every privileged account, every service account, every third-party access credential. They have processes for onboarding, auditing, and offboarding each one.

Most of those same security leaders have zero inventory of the AI agents running in their environment, even though those agents hold credentials with equivalent or greater privilege than many human accounts.

No IAM framework currently governs human privilege escalation and AI agent privilege escalation with the same level of rigor. Most vulnerability scanners can track every known CVE and alert on missing patches. Almost none of them can detect when a branch name has just exfiltrated a GitHub token through a container that the development team trusts by default.

Zaitsev's message at RSAC 2026 was direct: you already know what to do. This is not a new category of risk. Credential management, least privilege, validation before action: these are established disciplines. AI agents just made the cost of not applying them much higher.

The attack surface for AI coding tools is not the AI. The attack surface is everything the AI is connected to. Every token it holds, every scope it was granted, every API it can call without being asked twice.

That is what six research teams proved across nine months of disclosures. And that is what every security leader managing developer infrastructure needs to take seriously today, not after the next breach.

What This Means If You Are Just Starting to Think About This

You do not need to be a senior security engineer to take action here. If you use any of these tools, or work somewhere that does, there are three questions worth asking:

Does anyone at your organization know what credentials your AI coding tools are holding right now?
Are those credentials scoped to the minimum access the tool actually needs, or were they set up at maximum permissions because that was the default?
When was the last time anyone reviewed whether those permissions were still appropriate?

If the answers are no, not sure, and never: that is the gap. Filling it is not technically complicated. It requires the same practices that good security teams have applied to human accounts for decades. The only difference is that AI agents are newer, less visible, and very few governance systems have been updated to account for them yet.

The vendors are patching the specific bugs. That is good. The bugs will keep coming. What changes the outcome is whether organizations start treating AI agent credentials with the same seriousness they apply to any other privileged identity in their environment.

The attackers already understand that the credential is the target. The question is whether defenders have caught up.

Sources: BeyondTrust security research (March 2026), Adversa Claude Code findings (April 2026), Johann Rehberger and Persistent Security CVE-2025-53773 disclosure (August 2025), Orca Security RoguePilot research, Unit 42 Vertex AI P4SA disclosure, Gravitee 2026 API Survey, Sonar 2026 State of Code Developer Survey, Veracode LLM Security Study 2026.