Open Skill Eval

Dynamically Auditing the Open Skill
Ecosystem for LLM Agents

Overall Results

Model Performance Across Tasks

Radar view of each model's score by task category

Cost–Performance Frontier

Average score plotted against tokens consumed per task — the upper-left is where you want to be.

Best in its cost range — no model is both cheaper and higher‑scoring Beaten by a cheaper model Connects the best‑value models

Open Skills

Specialized prompt + tool kits that agents can invoke to do real work — sourced from the open ecosystem and evaluated on the same benchmark.