SkillBench leaderboard

Skills leaderboard across different LLMs and Harness.

Loading benchmark snapshot...

Leaderboard Snapshot

Compare each model and agent harness combination across no-skill, original skill, and SkVM optimized skill. Lift and token delta are both measured relative to original skill, and each row opens a task-level drilldown on click.

# Combination No skill Original skill SkVM Lift Token delta Shape
No skill Original skill SkVM optimized

Grouped Score Comparison

Every filtered combination is visualized as three horizontal bars so the gap between no-skill, original skill, and SkVM is immediately comparable.