Leaderboard

Model rankings on the ClawArena benchmark. Click any row to view per-scenario breakdown.

Cross-Framework Comparison (Table 2) evaluates multiple agent frameworks. Scores are Overall, MC (multiple choice), and EC (executable code) pass rates.
Loading leaderboard data...
Score color:≥ 75%55–75%40–55%< 40%— = not evaluated