ads header

Breaking News

Introducing EVMbench: A Joint Initiative by OpenAI and Paradigm

 

Introducing EVMbench: A Joint Initiative by OpenAI and Paradigm

As AI agents rapidly evolve, their ability to navigate complex codebases is becoming a cornerstone of future cybersecurity. We are excited to introduce EVMbench, a new benchmark developed in collaboration with OpenAI to measure how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities.

With over $100B in assets often sitting in open-source crypto contracts, the stakes couldn't be higher. EVMbench provides the visibility needed to understand and manage the security risks—and opportunities—that emerging AI capabilities bring to the Ethereum Virtual Machine (EVM) ecosystem.

Measuring Core AI Capabilities

EVMbench doesn't just ask if an AI can "read" code; it tests whether an agent can perform the end-to-end duties of a security researcher:

  • Detect: Identifying vulnerabilities in real-world contract code derived from actual audits.

  • Exploit: Demonstrating the impact by executing realistic attack scenarios in sandboxed environments.

  • Patch: Creating safe, robust fixes that pass rigorous verification testing.

The Rapid Evolution of AI Agents

The progress in this space is staggering. When this project began, top models could exploit less than 20% of critical bugs. Today, OpenAI's GPT-5.3-Codex is already successfully exploiting over 70%. It is now clear that AI agents will perform a significant portion of future audits, making tools like EVMbench essential for both measurement and acceleration.

A Call to Action for Researchers

We are releasing EVMbench’s tasks, tooling, and evaluation framework to the public. Our goal is to support continued research into AI cyber capabilities and encourage developers to integrate AI-assisted auditing into their defensive workflows today.

"EVMbench serves as both a preview and an accelerant toward a future where security is agent-led."

Explore the benchmark and join the research: https://lnkd.in/gySryDDb

No comments

The Hidden Cost of "Vibe-Coding": Real-World AI Risks in 2026

  The Hidden Cost of "Vibe-Coding": Real-World AI Risks in 2026 In the early days of generative AI, we called it a productivity re...