Leaderboard
Compare model performance across all task categories
Resource Consumption
Average total tokens per agent (prompt + completion + cache) — one chart per task
Compare model performance across all task categories
Average total tokens per agent (prompt + completion + cache) — one chart per task