‘Lies-in-the-Loop’ Attack Defeats AI Coding Agents

Summary

Researchers at Checkmarx Zero demonstrated a new prompt-injection vector called “lies-in-the-loop” (LITL) that tricks AI coding agents — demonstrated against Anthropic’s Claude Code — into executing dangerous commands by lying to the agent and to the human reviewer. The team used seemingly benign commands (for example, launching the Windows calculator) to prove remote code execution, then escalated the technique to hide malicious input in long GitHub issues and comments so a user approving a prompt could be coerced into running harmful commands or accepting code changes.

The attack exploits the human-in-the-loop workflow: the agent constructs the approval prompt from tainted external content (such as a GitHub issue), and the user is only shown what the agent surfaces. By overflowing terminal output and burying the injection off-screen, LITL makes it easy for even attentive developers to miss the malicious part and press Enter, enabling supply-chain style compromises like introducing malicious npm packages.

Key Points

“Lies-in-the-loop” (LITL) is a prompt-injection technique that persuades an AI agent and its human approver that a dangerous action is safe.
Checkmarx Zero proved LITL on Anthropic’s Claude Code, achieving command execution via hidden prompts and long GitHub issue/comments.
The attack hides malicious commands by producing overly long agent responses that push injected content out of immediate view, relying on user trust and haste.
Researchers demonstrated how LITL could be used to submit malicious npm packages to GitHub — showing a practical software supply chain risk.
Mitigations advised include sceptical human review, limiting automation, user education, tighter controls on agent privileges and defence-in-depth to reduce the blast radius.

Context and Relevance

AI-assisted coding agents are rapidly becoming part of developers’ workflows (Checkmarx Zero cites ~79% adoption in some capacity). That prevalence combined with agentic tooling and automation increases the attack surface for supply-chain and remote-code-execution threats. LITL highlights a structural weakness: when agents rely on external content and human approval, attackers can weaponise those trust relationships. This isn’t just theoretical — the researchers showed concrete paths to taint repositories and inject malicious packages.

Organisations adopting agentic or automated developer tools should reassess how much authority those agents get, enforce least-privilege for any actions they request, and train reviewers to treat agent prompts as potentially adversarial content. The finding also feeds into wider trends: more agentic automation means more potential for systemic failure if human oversight is assumed but not hard-nosed.

Why should I read this?

If you use or manage AI code assistants, read this — it’s a short wake-up call. The researchers show how easy it is to hide a nasty command in plain sight and get a developer to approve it. We’ve saved you the time: key tactics, real demo, and practical mitigation steps all pulled together so you know what to lock down first.

Author style

Punchy — the piece flags a red alarm for dev and security teams: this isn’t just an academic trick, it’s a clear, actionable vector that could lead to real supply-chain compromise. If you run CI/CD, package registries or grant agents any privileges, dive into the details.

Source

Source: https://www.darkreading.com/application-security/-lies-in-the-loop-attack-ai-coding-agents