The False Reality Behind Agentic AI Job-Killers
Summary
The author reviews a new study by the Centre for AI Safety (CAIS) that tested leading agentic AI systems on 240 real freelance projects across 23 categories (eg. game development, product design, video content, writing, data analysis). Models tested included Manus, Grok 4, Sonnet 4.5, GPT-5 (ChatGPT Agent), and Gemini 2.5 Pro. The study found a 97.5% failure rate for producing “client-acceptable” deliverables — what researchers call widespread “AI slop.” The piece argues this undermines the narrative that agentic AI will imminently replace large numbers of real-world jobs, and advises caution for executives planning large, immediate AI-driven workforce reductions or high-capital deployments.
The article concludes agentic AI today is better framed as a productivity companion requiring human oversight rather than a drop-in autonomous worker. CEOs and business owners should therefore align expectations, invest in training and governance, and review the CAIS analysis before committing big budgets to full automation plans.
Key Points
- CAIS tested agentic AIs on 240 real freelance tasks across 23 categories to compare AI outputs with paid human deliverables.
- Leading agentic models (Manus, Grok 4, Sonnet 4.5, GPT-5/ChatGPT Agent, Gemini 2.5 Pro) were evaluated and largely failed to meet client-acceptable standards.
- The reported failure rate was 97.5% — most AI results were low quality, misinterpreted briefs, or compounded errors (“AI slop”).
- Current agentic AI struggles with multi-step task management: understanding briefs, finding resources, creating workarounds, checking and correcting its own output.
- Implication for business: AI is for now a productivity enhancer and assistant, not a direct substitute for many real-world jobs; human oversight, training and realistic budgeting remain essential.
Context and Relevance
This matters because many organisations are budgeting heavily for automation and headcount reduction based on expectations that agentic AI will perform complex professional work end-to-end. The CAIS study is a timely reality check for C-suite decision-makers, procurement teams and HR leaders: the hype outpaces current capabilities. For stakeholders planning AI transformation, the study suggests prioritising pilot programmes, governance, reskilling and measured investments rather than wholesale replacement strategies.
Why should I read this?
Look — if you’re about to sign off a six‑ or seven‑figure AI rollout hoping to cut staff tomorrow, read this first. It saves you the embarrassment (and the bill). The piece summarises hard evidence that agentic AI still trips over real-world, multi-step tasks and produces a lot of unusable output. Handy if you want to avoid wasting money and upsetting teams.
Author (Punchy)
Rick Andrade — short, blunt and practical. If you run budgets or hire remote talent, this is not just interesting commentary: it’s an operational cautionary note. Read the CAIS paper and use the findings to temper investment and policy decisions now.
Source
Source: https://ceoworld.biz/2025/12/07/the-false-reality-behind-agentic-ai-job-killers/