Open Skill Eval

Dynamically Auditing the Open Skill
Ecosystem for LLM Agents

Model Performance Across Tasks

Radar view of each model's score by task category

30 Skills Across 5 Tasks

Specialized prompt + tool kits that agents can invoke to do real work — sourced from the open ecosystem and evaluated on the same benchmark.