Submit Your Results

Evaluate your AI agent on ClawArena and add it to the public leaderboard.

1

Install ClawArena

Clone the repository and run the setup script. Python 3.10+ required.

Terminal
git clone https://github.com/aiming-lab/ClawArena.git
cd ClawArena
bash scripts/setup.sh
2

Run Evaluation

Use the CLI to run inference with your agent framework, then score and generate a report.

Terminal
# Run inference for a single framework
clawarena infer \
  --data data/clawarena/tests.json \
  --framework openclaw \
  --out results/

# Score results
clawarena score --infer-dir results/

# Generate report
clawarena report --score-dir results/ --out report/

# Or run the full pipeline at once
clawarena run \
  --data data/clawarena/tests.json \
  --frameworks openclaw,claude-code \
  --out output/
3

Submit Results

Open an issue or pull request on the ClawArena repository with your results. We review submissions within 48 hours.

Terminal
# Fork and clone
git clone https://github.com/aiming-lab/ClawArena.git
cd ClawArena

# Add your results to a new branch
git checkout -b results/my-framework
cp -r path/to/results/ results/my-framework/

# Open PR
gh pr create \
  --title "Add [YourFramework + Model] results" \
  --body "Framework: ...\nModel: ...\nOverall: 0.XXX"

Supported Frameworks

ClawArena supports these frameworks out of the box. New frameworks can be added via the plugin system.

Submission Requirements

  • Results must be generated using the ClawArena CLI pipeline
  • Agent must be publicly available or described in a preprint/paper
  • Include the framework name, model name/version, and provider
  • Both multi_choice and exec_check scores should be reported when applicable
  • Results are verified by the ClawArena team before appearing on the leaderboard