EVMBench public audit comparison

Detection Rate Comparison

Vulnerability detection rate (%) on the EVMBench dataset. Higher is better.

Lead score 83.0%
Gap to next +7.8
Systems compared 16
V
Vulnaut AuditAgent Agent
83.0%
D
Azimuth Agent TestMachine
75.2%
N
AuditAgentAgent Nethermind Security
67.0%
G
Guardix Agent Guardix
59.8%
A
Claude Opus 4.6 Anthropic
45.6%
O
GPT-5.3-Codex xhigh reasoning
39.2%
O
GPT-5.2 xhigh reasoning
39.2%
A
Claude Opus 4.5 Anthropic
36.1%
O
GPT-5.3-Codex high reasoning
34.2%
O
OC-GPT-5.2 OpenAI
30.0%
O
GPT-5.2 high reasoning
29.7%
O
GPT-5.2 medium reasoning
29.7%
O
GPT-5.3-Codex medium reasoning
26.9%
O
GPT-5 OpenAI
23.3%
O
GPT-5.2 low reasoning
22.2%
G
Gemini 3 Pro Google
20.8%