AI Agents Have a Security Problem. We’re Fixing It.

May 7, 2026

The way the industry tests AI agent security is broken. Existing benchmarks rely on static, hardcoded attacks that bear no resemblance to how real adversaries operate. Real attackers iterate. They adapt. They use increasingly powerful LLMs to probe for weaknesses in real time.

Building security standards on static metrics is like testing a lock with the same key forever and calling it secure.

Today, NEAR AI and FailSafe are launching AttackBench: an open-source benchmark platform built to raise that standard.

Adaptive attacks. Real results.

AttackBench deploys LLM-powered adversaries that adapt across attempts, the same way real threats do. Powered by FailSafe’s SWARM methodology, it tests AI agents against machine-speed, iterative attacks rather than compliance checklists.

In the inaugural evaluation, AttackBench ran 52 adversarial scenarios across four leading models inside three major agent frameworks. The findings exposed a critical shared vulnerability: field-content trust. Agents inherently trust the data they ingest from external tools. That means an attacker can disguise malicious instructions as routine metadata, and most frameworks will execute them without question.

IronClaw didn’t.

IronClaw as the secure baseline

Across every framework tested, IronClaw — NEAR AI’s secure, open-source agent harness— recorded the fewest violations. Strict workspace-scoped permissions and explicit tool-call guardrails gave it the highest adversarial resilience of any framework in the evaluation, with the biggest gap showing up on write-instruction attacks.

Why this matters now

AI agents are no longer sandboxed experiments. They hold credentials, move real money, and operate with deep access to the systems they run inside. The attack surface is real and it is growing fast. Regulatory frameworks built on static checklists are not equipped to keep up.

AttackBench gives security teams a continuous diagnostic tool: empirical, up-to-date evidence of where deployed agents are vulnerable, tested against attack methods that evolve alongside the threats themselves.

What’s next

In the coming weeks, NEAR AI and FailSafe will continue to expand the benchmarks, harnesses tested, and partners. We welcome feedback and new partners. Please feel free to reach out to the NEAR AI or FailSafe teams with questions, comments, or if you’d like to join our coalition!

Hector Martinez

In today’s interconnected world, health challenges are global—and so are the solutions. The Global Health Connect Podcast explores the intersection of global healthcare and innovative partnerships. Join us as we uncover the stories.

AI Agents Have a Security Problem. We’re Fixing It.

AI Agents Have a Security Problem. We’re Fixing It.

Adaptive attacks. Real results.

IronClaw as the secure baseline

Why this matters now

What’s next

Comments

Robin3151

Edna2674

Richard1979

Paige1738

Janice3055

Virginia2768

Darius4015

Mary2163

Tiffany974

Irene4172

Stella4826

Conrad3150

Claire3890

Priscilla4547

Charlie2399

Gerard2890

Isabelle4019

Caitlin4922

Barret2973