Open Skill Eval

Dynamically Auditing the Open Skill
Ecosystem for LLM Agents

Model Performance Across Tasks

Radar view of each model's score by task category

How Much Quality per Token?

Average score plotted against tokens consumed per task — the upper-left is where you want to be.

Best in its cost range — no model is both cheaper and higher‑scoring Beaten by a cheaper model Connects the best‑value models

30 Skills Across 5 Tasks

Specialized prompt + tool kits that agents can invoke to do real work — sourced from the open ecosystem and evaluated on the same benchmark.