SkillBench leaderboard
Skills leaderboard across different LLMs and Harness.
Loading benchmark snapshot...
Leaderboard Snapshot
Compare each model and agent harness combination across no-skill, original skill, and SkVM optimized skill. Lift and token delta are both measured relative to original skill, and each row opens a task-level drilldown on click.
| # | Combination | No skill | Original skill | SkVM | Lift | Token delta | Shape |
|---|
No skill
Original skill
SkVM optimized
Grouped Score Comparison
Every filtered combination is visualized as three horizontal bars so the gap between no-skill, original skill, and SkVM is immediately comparable.