Documentation
SkVM is a compilation and runtime system that makes LLM agent skills portable across heterogeneous models and harnesses. It implements the SkVM paper: profile model capabilities, compile skills to match, and optimize execution at runtime.
New to SkVM? Start with the Quick Start guide to profile a model, compile a skill, and measure the results in under 5 minutes.
Installation
Most users should install the standalone skvm CLI and use it directly. Source checkout is still supported for contributors, but the installed binary is the primary workflow described in the current README.
# One-line installer (macOS / Linux) curl -fsSL https://skillvm.ai/install.sh | sh # Or install from npm (Node 18+) npm i -g @skillvm/skvm # Set your API key and verify the install export OPENROUTER_API_KEY=sk-or-... skvm --help
The installer places the standalone binary under ~/.local/share/skvm/bin/skvm, symlinks it into ~/.local/bin/skvm, and bundles a private opencode copy used by skvm jit-optimize. If you are developing from source, clone with --recurse-submodules so skvm-data/ is available.
Agent-facing helper skills ship inside the install. Copy skvm-jit and skvm-general from ~/.local/share/skvm/skills/ into your agent harness skill directory when you want the harness to drive profiling, AOT compilation, proposal review, or post-task log submission.
Quick Start
The current CLI quick start has four common workflows: profile the target, AOT-compile the skill, autotune with synthetic tasks, or optimize from an existing conversation log.
For every SkVM command below, model fields use the fully qualified form <provider>/<model-id>. For example, OpenRouter targets are written like openrouter/qwen/qwen3.5-35b-a3b.
1. Profile the target model
Generate a Target Capability Profile (TCP) for the target model and harness pair by running the 26-primitive profiling suite:
skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b --adapter=bare-agent
The resulting profile is cached under .skvm/profiles/ and reused by later compilation runs for the same model and adapter combination.
2. AOT-compile the skill
Run the AOT compiler on the skill directory for the profiled target. The example below uses Pass 1 only and explicitly selects the compiler model:
skvm aot-compile --skill=path/to/skill-dir --model=openrouter/qwen/qwen3.5-35b-a3b --adapter=bare-agent --pass=1 --compiler-model=openrouter/anthropic/claude-sonnet-4.6
AOT outputs are written into the proposals tree under .skvm/proposals/aot-compile/, where you can inspect the compiled variant before adopting it.
3. Autotune with synthetic tasks
Let SkVM generate synthetic tasks from the skill itself, then iterate optimize → rerun → score against the specified target model and adapter:
skvm jit-optimize --skill=path/to/skill-dir --task-source=synthetic --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b --target-adapter=bare-agent
4. Or optimize from an existing conversation log
For post-mortems and feedback loops, feed prior run logs directly into the optimizer without rerunning tasks:
skvm jit-optimize --skill=path/to/skill-dir --task-source=log --logs=path/to/session.jsonl --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b
CLI: profile
Profile a model's capabilities against the 26-primitive catalog. The command writes a cached TCP per (model, adapter), so later aot-compile and pipeline runs can reuse it instead of re-running microbenchmarks.
skvm profile --model=<provider>/<model-id> [options]
Detailed parameter guide. profile is the entry point that decides which model-adapter pairs need work, whether cached TCPs may be reused, and how profiling slots are distributed across the resulting job matrix.
| Flag | Description | Default |
|---|---|---|
--model | Required unless --batch is used. Accepts one or more provider-prefixed model IDs in the form <provider>/<model-id>. Each model is combined with each selected adapter to form a separate profiling job. | required |
--adapter | Selects which harness implementation to profile: bare-agent, opencode, openclaw, hermes, or jiuwenclaw. Comma-separated values profile multiple harnesses against the same model set. | bare-agent |
--primitives | Restricts profiling to a comma-separated subset of primitive IDs. Use this when iterating on a small slice of the primitive catalog instead of paying for the full 26-capability sweep. | all registered primitives |
--skip | Explicitly excludes primitive IDs from the run after primitive selection is resolved. Useful for temporarily suppressing unstable or already-known primitives. | none |
--instances | Controls how many randomized instances are generated per difficulty level. Higher values reduce noise but increase cost and elapsed time. | 3 |
--force | Ignores cached TCPs and forces a fresh profile. This is the flag to use when the model behavior, adapter behavior, or primitive implementation changed and the old TCP is no longer trustworthy. | false |
--list | Skips execution entirely and prints cached profiles already available on disk. It is the cheapest way to check whether a required TCP already exists. | off |
--batch | Builds the model set from the benchmark configuration instead of requiring --model. In batch mode, the adapter default broadens to all registered adapters rather than only bare-agent. | false |
--concurrency | Sets the total profiling slot budget across all model-adapter combinations. The scheduler distributes slots hierarchically per adapter and then per model, so this controls overall throughput rather than per-primitive parallelism in isolation. | 1 |
--verbose | Enables debug logging so you can inspect primitive scheduling, adapter setup, and error details while profiling is running. | false |
Examples
# Default OpenRouter route skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b # Profile multiple models in parallel skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b,openrouter/deepseek/deepseek-chat-v3-0324 --concurrency=4 # Native Anthropic route skvm profile --model=anthropic/claude-sonnet-4.6 # Profile multiple adapters for one model skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b --adapter=bare-agent,opencode # List all cached profiles skvm profile --list
CLI: aot-compile
AOT-compile one or more skills for one or more target model-adapter pairs. The compiler consumes an existing TCP, runs the selected passes, validates the result with guard checks, and writes compiled variants under ~/.skvm/proposals/aot-compile/.
skvm aot-compile --skill=<path> --model=<provider>/<model-id> [options]
Detailed parameter guide. aot-compile first resolves skill paths, then loads TCPs for every requested model × adapter pair, and finally runs a shared compiler provider across the resulting job matrix.
| Flag | Description | Default |
|---|---|---|
--skill | Required. Accepts one or more skill directories or SKILL.md paths. Every resolved skill is compiled against every requested model and adapter. | required |
--model | Required. Accepts one or more target model IDs in the form <provider>/<model-id>. Each model must already have a cached or explicitly supplied TCP for the selected adapter. | required |
--adapter | Selects one or more harnesses whose TCPs should be used during compilation. This matters because capability profiles are stored per (model, adapter), not just per model. | bare-agent |
--profile | Overrides cache lookup with a specific TCP JSON file. This is supported only for a single model plus single adapter job and is mainly useful for testing or reproducing a particular profile snapshot. | load from cache |
--pass | Selects which compiler passes to run. Any subset such as 1, 1,2, or 1,3 is legal, and the chosen pass set is reflected in the output directory pass tag. | 1,2,3 |
--concurrency | Controls how many compile jobs run in parallel across the skill-model-adapter job matrix. It affects total throughput, not the internal parallelism extracted by Pass 3. | 1 |
--dry-run | Runs the compiler pipeline and prints the result summary without writing a compiled variant to disk. Use this to inspect gaps, transforms, and guard status before publishing an artifact. | false |
--compiler-model | Overrides the LLM backend used for the LLM-backed parts of compilation, such as SCR extraction, agentic rewriting, dependency extraction, and workflow decomposition. | openrouter/anthropic/claude-sonnet-4.6 |
Compilation Passes
The three-pass compiler is sequential on purpose: each pass narrows uncertainty and emits artifacts that the next pass can trust. Guard validation runs after compilation so unsafe or internally inconsistent rewrites are caught before the variant is accepted.
- Pass 1 — Capability Gap Analysis: extracts the SCR from
SKILL.md, compares the required primitive levels against the TCP, and splits deficits into hard absence vs weak proficiency. Hard gaps trigger substitution onto alternative primitive paths; weak gaps trigger compensation such as extra scaffolding, examples, decomposition, or stronger execution constraints. SCR extraction and rewriting are LLM-backed, while the actual gap analysis is deterministic computation. - Pass 2 — Environment Binding: inspects the skill bundle for tools, binaries, packages, and API dependencies, then checks whether the current environment already satisfies them. The output is not just a list of dependencies: SkVM also emits an idempotent environment setup script so the compiled artifact carries enough operational context to bootstrap itself on a clean machine.
- Pass 3 — Concurrency Extraction: decomposes the workflow into a DAG, identifies independent stages, and records parallelism opportunities as DLP, ILP, and TLP hints. In practice this means the compiler is trying to separate steps that can be run independently from steps that are only ordered because the original prose was linear.
Pass 1 changes what the skill asks the model to do, Pass 2 changes what the environment must provide, and Pass 3 changes how the remaining work can be scheduled. Reordering them would make later analysis operate on stale assumptions.
CLI: jit-optimize
Proposal-based skill optimization supports three explicit evidence sources: synthetic, real, and log. Regardless of source, the optimizer writes a proposal tree, records per-round evidence and root-cause analysis, and picks a best round for later review or deployment.
# Synthetic autotune skvm jit-optimize --skill=path/to/skill-dir --task-source=synthetic --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b --rounds=3 # Real bench tasks skvm jit-optimize --skill=path/to/skill-dir --task-source=real --tasks=task-a,task-b --test-tasks=task-c --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b # Existing execution logs skvm jit-optimize --skill=path/to/skill-dir --task-source=log --logs=path/to/log1.jsonl,path/to/log2.jsonl --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b
Detailed parameter guide. jit-optimize always builds a proposal keyed by (harness, target-model, skill-name). What changes across task sources is how evidence is collected and whether tasks are re-executed or only analyzed retrospectively.
Shared Required Flags
| Flag | Description | Default |
|---|---|---|
--skill | Path to the skill directory being optimized. In batch mode this is replaced by --skill-list. | required |
--task-source | Explicitly chooses the evidence source: synthetic, real, or log. The CLI does not infer this from the other flags. | required |
--optimizer-model | The model that edits the skill based on accumulated evidence. This is separate from the target model being optimized for. | required |
--target-model | Required for every source. For synthetic and real it is the model that reruns tasks; for log it is still required because it determines proposal storage location. | required |
--target-adapter | Harness paired with the target model. In log mode this is informational, but in rerun modes it determines which adapter actually executes the evaluation loop. | bare-agent |
Task-Source-Specific Flags
| Source | Flags | Meaning |
|---|---|---|
synthetic | --synthetic-count, --synthetic-test-count | Controls how many train and held-out test tasks the optimizer should synthesize directly from the skill description before the loop begins. |
real | --tasks, --test-tasks | Uses explicit benchmark tasks as evidence. If --test-tasks is omitted, the training set is reused as the evaluation set, which weakens holdout protection. |
log | --logs, --failures | Consumes existing conversation logs and optional structured failure JSON files. No tasks are rerun in this mode. |
Loop, Delivery, and Batch Flags
| Flag | Description | Default |
|---|---|---|
--rounds | Maximum number of optimization rounds after the baseline round. Round 0 is always the starting skill snapshot; later rounds iterate edit → rerun → score. | 3 for synthetic/real, 1 for log |
--runs-per-task | Number of executions per task per round in rerun modes. Raised above 1 by default to make best-round selection less sensitive to single-run noise. | 2 |
--task-concurrency | Maximum in-flight task runs across train and test sets in a round. | 1 |
--convergence | Early-exit threshold on the primary score. When a round meets or exceeds this score, the loop can stop before consuming all remaining rounds. | 0.95 |
--baseline | Also evaluates no-skill and original-skill baselines for comparison. This is forbidden in log mode because log mode does not rerun tasks. | false |
--no-keep-all-rounds | Prunes proposal storage so only the chosen best round is retained instead of keeping every intermediate round directory. | false |
--auto-apply | Deploys the best round back onto the original skill directory immediately after selection. | false |
--skill-list | Runs batch optimization over one skill path per line. | off |
--concurrency | Batch-job parallelism when multiple skills are optimized in the same invocation. | 1 |
SkVM validates task-source-specific flags strictly. For example, --tasks is only valid for real, --logs is only valid for log, and loop-control flags like --baseline are rejected for log because there is no rerun phase.
CLI: pipeline
Profile the target if no cached TCP exists, then run aot-compile with that TCP. This is the shortest path when you want a compiled variant but do not want to manually split the work into profile and compile steps.
skvm pipeline --skill=<path> --model=<provider>/<model-id> [options]
| Flag | Description | Default |
|---|---|---|
--skill | Required. Skill directory or SKILL.md path that should be compiled. | required |
--model | Required. Target model ID in the form <provider>/<model-id>, used both for cache lookup and eventual compilation. | required |
--adapter | Harness whose TCP should be used. The same adapter is used for auto-profiling when no cached profile exists. | bare-agent |
--force-profile | Forces a fresh profiling run instead of reusing a cached TCP. Use it when you suspect the cache is stale but still want the convenience of the one-command pipeline. | false |
--profile | Supplies a specific TCP file and skips auto-profiling. This is the escape hatch for deterministic reproduction or external TCP inspection. | auto-load or auto-profile |
--pass | Selects which AOT passes to run after the TCP is available. | 1,2,3 |
--compiler-model | Overrides the compiler LLM used during the compilation stage. | openrouter/anthropic/claude-sonnet-4.6 |
--dry-run | Prints the resulting compile summary without writing the output variant. | false |
Operationally, pipeline has three branches: load an explicit profile, reuse a cached profile, or run profile inline. Only after the TCP is resolved does it move on to the compile stage.
CLI: proposals
Inspect, diff, group, serve, accept, or reject the proposal trees created by jit-optimize. This command is the review surface between optimization and deployment.
skvm proposals list | show | diff | report | serve | accept | reject [options]
| Subcommand / Flag | Description |
|---|---|
list | Lists proposals and supports filtering by --harness, --target-model, --skill, and --status, plus sorting and grouping for review workflows. |
show <id> | Prints proposal metadata, per-round summary, and optionally full analysis content with --full. |
diff <id> [--round=N] | Prints a unified diff between the original skill and a chosen round. If --round is omitted, the best round is used. |
report | Generates an HTML report for the filtered proposal set. --out overrides the output path. |
serve | Starts the local proposal review server. --port, --host, and --no-open control how the server is exposed. |
accept <id> | Deploys the best round or the round chosen by --round. --target overrides the deployment directory. |
reject <id> | Marks the proposal as rejected without deploying anything. |
--sort, --min-delta, --group-by, --no-color | Formatting and review controls for list/report-style outputs. |
Use proposals as the human review gate. jit-optimize is allowed to propose edits, but accept is the point where those edits become the live skill bundle.
CLI: run
Execute one task against one model+adapter, with or without a skill. This is execute-only and primarily for testing; use bench when you need scored evaluation.
skvm run --task=<path> --model=<provider>/<model-id> --adapter=<name> [--skill=<path>]
| Flag | Description | Default |
|---|---|---|
--task | Required. Path to a task JSON file using the bench task schema. | required |
--model | Required. Provider-prefixed model identifier in the form <provider>/<model-id>, passed through to the chosen adapter. | required |
--skill | Optional skill to inject for the run. Omit it to execute the task without any skill assistance. | none |
--adapter | Harness used to execute the task. | bare-agent |
--workdir | Reuses a specific working directory instead of creating a temporary one. Useful for reproducing runs and inspecting artifacts across iterations. | temporary directory |
--timeoutMs | Overrides the task timeout defined in the task file. | task-defined timeout |
--maxSteps | Overrides the task or adapter step budget for the run. | task-defined max steps |
--verbose | Enables more detailed execution logging. | false |
SkVM copies any files under the task's fixtures/ directory into the work directory before execution, then reports workdir, timing, token usage, and non-OK run status at the end.
CLI: bench
Run benchmark conditions over tasks, skills, and models. bench is the widest CLI surface in SkVM because it covers standard benchmarking, deferred judging, session resumption, task import, and condition-to-condition comparison.
skvm bench --model=<provider>/<model-id> [options]
Detailed parameter guide. The standard execution path builds a benchmark plan from model × adapter × task × condition, while submodes like judge, --compare, --import, and --custom bypass parts of that plan builder.
| Flag | Description | Default |
|---|---|---|
--model | Required for normal benchmarking unless you are resuming a session that already records the model. Use provider-prefixed IDs in the form <provider>/<model-id>; comma-separated values enable multi-model mode. | required |
--adapter | Selects one or more harnesses. Multi-adapter mode is supported, but cannot be combined with multi-model mode in the same invocation. | bare-agent |
--tasks | Restricts benchmarking to a comma-separated task subset instead of the full task pool. | all tasks |
--source | Filters tasks by origin source, such as pinchbench, skillsbench, or other importer-specific labels stored in the task metadata. | all sources |
--conditions | Selects which skill conditions to evaluate, including no-skill, original, aot-compiled, pass-specific AOT variants like aot-compiled-p12, jit-boost, and jit-optimized. | all standard conditions |
--custom | Runs a YAML-defined custom benchmark plan with explicit nested task-skill-model-adapter mappings. This bypasses the standard condition system entirely. | off |
--skill-mode | Controls whether skills are directly injected or discovered by the harness. This matters when the harness has its own skill loading semantics. | inject |
--jit-runs | Warm-up repetitions used by the jit-boost condition before measuring the solidified path. Higher values give the boost mechanism more chances to promote repeated patterns. | 3 |
--timeout-mult | Multiplies per-task timeout budgets. Use it when a model or adapter is known to be slower than the default task envelopes assume. | 1.0 |
--max-steps | Overrides the maximum number of agent steps each run may take before the harness is stopped. | 30 |
--judge-model | Selects the LLM used by llm-judge criteria. This only affects judging, not the model being benchmarked. | openrouter/anthropic/claude-sonnet-4.6 |
--compiler-model | Overrides the compiler model used when a requested benchmark condition needs to materialize an AOT variant during the run. | openrouter/anthropic/claude-sonnet-4.6 |
--profile | Supplies a TCP path for AOT conditions. Without it, AOT conditions are skipped when the needed TCP cannot be resolved from cache. | auto-load if available |
--resume | Resumes an interrupted session by id or with latest. This preserves progress instead of re-running already completed work. | off |
--list-sessions | Prints known benchmark sessions and their statuses without running anything. | off |
--concurrency | Controls total parallel task execution. In multi-model mode the slots are distributed across models rather than spent inside a single model run. | 1 |
--runs-per-task | Repeats each task-condition pair multiple times and averages the result, reducing noise from stochastic decoding or unstable harness behavior. | 1 |
--keep-workdirs | Keeps task working directories after completion so failures can be inspected manually. | false |
--verbose | Enables debug logging during orchestration and execution. | false |
Async Judge and Compare Modes
| Flag / Subcommand | What It Does |
|---|---|
--async-judge | Defers llm-judge criteria into a post-run batch. This is useful when you want the expensive LLM judging stage decoupled from the main benchmark execution. |
bench judge --manifest=<dir> | Runs the deferred judge pass later from the generated manifest directory. |
--merge-judge=<results-dir> | Merges post-run judging results back into an existing report. |
--compare | Switches bench into artifact comparison mode instead of execution mode. |
--skill-path, --lhs, --rhs, --output-dir | Required inputs for compare mode: the skill to inspect, the two conditions to compare, and where to write the generated diff/report outputs. |
--analyze-model | Optional summarization model used to generate a higher-level explanation of the skill difference during compare mode. |
Import Mode
| Flag | What It Does |
|---|---|
--import | Runs task importers instead of benchmarking. Current sources are pinchbench and skillsbench. |
--path | Overrides the source directory used by the importer. |
--dry-run | Shows what would be imported without writing files. |
# Single model benchmark skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --adapter=bare-agent # Specific conditions and tasks skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --conditions=no-skill,original,aot-compiled,jit-boost # Defer LLM-judge and process it later skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --async-judge skvm bench judge --manifest=path/to/manifest-dir --judge-model=openrouter/anthropic/claude-sonnet-4.6
CLI: clean-jit
Clear persisted JIT artifacts for a model+adapter pair when you want to reset solidification or proposal state.
skvm clean-jit --model=<provider>/<model-id> --adapter=<name>
| Flag | Description | Default |
|---|---|---|
--model | Required. Provider-prefixed model whose runtime JIT state should be cleared. | required |
--adapter | Required. Adapter whose runtime artifacts should be cleaned alongside the model key. | required |
--dry-run | Prints the deletion plan, including runtime directories and matching solidification-state.json files, without removing anything. | false |
--yes | Confirms destructive cleanup. Required unless --dry-run is active. | false |
--include-bench-logs | Also deletes matching benchmark session directories in addition to runtime JIT state. | false |
The command intentionally keeps compiled SKILL.md artifacts, candidate metadata, and cached profiles intact. It is for resetting JIT effects, not wiping every derived artifact.
CLI: logs
List recent runs across profiling, compilation, bench, and runtime subsystems from the shared cache tree.
skvm logs
| Flag | Description | Default |
|---|---|---|
--type | Filters sessions by subsystem, such as profile, aot-compile, bench, run, or pipeline. | all session types |
--limit | Limits how many recent entries are shown. | 20 |
--all | Disables the limit and prints the full session index. | false |
Each entry includes status, type, model or model count, harness, skill, summary text, and the log directory path, so logs acts as a lightweight session index over the shared cache.
Primitive Catalog
SkVM defines 26 primitive capabilities that describe what an LLM agent can do. Each primitive is testable at three difficulty levels (L1–L3). The catalog is organized into four domains:
| Domain | Prefix | Examples |
|---|---|---|
| Code Generation | gen.code.* | write, edit, debug, test, refactor |
| Tool Use | tool.* | file.read, file.write, exec, web_fetch |
| Reasoning | reason.* | plan, decompose, diagnose, analyze |
| Instruction Following | follow.* | format, constraint, multi-step, edge-case |
Each primitive has a dedicated microbenchmark generator that produces randomized test instances. Generators use two evaluation patterns:
- Tool-use primitives — agent runs tools, evaluator checks files in the working directory
- Text-only primitives — profiler writes LLM response to file, evaluator reads it
TCP — Target Capability Profile
A TCP is the output of profiling: a JSON file that maps each of the 26 primitives to a proficiency level.
| Level | Meaning |
|---|---|
L3 | Full proficiency — handles complex instances |
L2 | Moderate proficiency — handles standard instances |
L1 | Basic proficiency — handles simple instances |
L0 | No proficiency — fails even simple instances |
The profiler uses progressive testing: it tests L3 first. If the model passes, L2 and L1 are skipped (assumed passed). This minimizes API costs while maintaining accuracy.
{
"gen.code.write": { "level": 2, "scores": { "L3": 0.33, "L2": 0.83 } },
"tool.file.read": { "level": 3, "scores": { "L3": 1.0 } },
"reason.plan": { "level": 1, "scores": { "L3": 0.0, "L2": 0.17, "L1": 0.67 } }
}
SCR — Skill Capability Requirement
An SCR describes what primitives a skill needs and at what proficiency level. It is extracted automatically by the compiler (Pass 1) from the skill's SKILL.md file.
The SCR may include alternative implementation paths — different ways to accomplish the same goal using different primitives. This allows the compiler to find substitutions when a model lacks a required capability.
The gap between a model's TCP and a skill's SCR determines what compilation transforms are needed. No gap means no transformation — the skill runs as-is.
3-Pass Compilation
The AOT compiler transforms skills in three sequential passes. Each pass emits artifacts that narrow the next stage's search space, so the compiler is not just editing prose: it is moving from capability analysis to environment binding to executable scheduling hints.
Pass 1: Capability Gap Analysis
Pass 1 reads the skill as a requirement document. It extracts the SCR, maps each purpose to the primitives and minimum levels it needs, compares those against the target TCP, and decides whether the skill can run unchanged, needs compensation, or needs structural substitution.
- L0 gaps — capability absent → substitution (replace with alternative primitives)
- Weak gaps — capability present but below required level → compensation (add scaffolding, examples, decomposition)
Because the SCR can contain alternative implementation paths, Pass 1 is not limited to saying “the model is weaker.” It can often redirect the skill onto a different primitive path that the target model is better at executing.
Pass 2: Environment Binding
Pass 2 binds the rewritten skill to the actual machine it will run on. It extracts dependency manifests, checks whether binaries, Python packages, APIs, or shell tools already exist, and emits an idempotent setup script that can recreate the required environment.
This means AOT output is not only “better instructions for the model”; it also becomes a more operational skill bundle with explicit setup knowledge instead of hidden environmental assumptions.
Pass 3: Concurrency Extraction
Pass 3 analyzes the workflow structure itself. It decomposes the skill into a DAG, identifies true data and control dependencies, and looks for places where the original sequential prose can be turned into parallel work without changing semantics.
- DLP — Data-Level Parallelism (process independent data chunks)
- ILP — Instruction-Level Parallelism (pipeline independent steps)
- TLP — Task-Level Parallelism (run independent subtasks concurrently)
The resulting variant can therefore preserve the same end behavior while exposing runtime scheduling opportunities that a plain natural-language skill would usually leave implicit.
JIT Optimization
SkVM provides two independent JIT systems that operate after the original skill already exists. They solve different problems: JIT-boost reduces repeated runtime cost by bypassing predictable LLM calls, while JIT-optimize edits the skill itself based on evidence collected from runs.
JIT-Boost (Code Solidification)
JIT-boost is a runtime specialization layer. It first uses a headless agent to scan the whole skill bundle and emit boost-candidates.json entries containing code signatures, keywords, parameter templates, and execution templates. During execution, runtime hooks watch LLM calls for repeated matches against those signatures.
After enough consecutive matches, a candidate is promoted. From that point on, SkVM extracts parameters directly from the prompt, executes the stored template, and bypasses the LLM entirely for that repeated pattern.
- Zero LLM calls at runtime for promoted patterns
- Automatic demotion on failure (falls back to LLM)
- Model and harness agnostic — stored per skill
So JIT-boost is closer to code caching or solidification than to skill rewriting: it accelerates a stable repeated behavior path without changing the skill source itself.
JIT-Optimize (Skill Rewriting)
JIT-optimize is a proposal-driven editing loop. It normalizes evidence into a shared schema, copies the skill into a temporary workspace, writes the evidence and history into .optimize/, and launches a headless agent that can edit SKILL.md or any bundle files.
The optimizer must submit a structured record containing a required rootCause, reasoning, confidence, and changed files. SkVM then snapshots the workspace as a numbered round, computes the actual diff, reruns evaluation when appropriate, and chooses the best round according to score and monotonicity checks.
Because the result is stored as a proposal tree rather than applied immediately, JIT-optimize is a controlled editing workflow, not an in-place mutation mechanism.
Autotune
Autotune is the fully closed loop built on top of jit-optimize --task-source=synthetic: generate evaluation tasks, execute them, score the result, propose edits, and repeat until rounds are exhausted or convergence is reached. It is the nearest thing SkVM has to self-supervised online skill improvement.
System Overview
SkVM is structured as a modular pipeline:
Profile Tool ──TCP──> AOT Compiler ──Variant──> Runtime + Agent
│ │ │
26 primitives 3 passes JIT-boost + JIT-optimize
L3→L1 progressive 1: capability gaps - code solidification
2: env binding - skill optimization
3: concurrency DAG - autotune loop
Key design principles:
- Decoupled stages — profiler, compiler, and runtime can run independently
- Pluggable adapters — any agent harness can be integrated via the
AgentAdapterinterface - Pluggable providers — any LLM backend via the
LLMProviderinterface - Cached artifacts — profiles, logs, and proposal-tree outputs are persisted and reused
Agent Adapters
Adapters wrap agent harnesses and provide a uniform interface. All adapters support RuntimeHooks for JIT monitoring.
| Adapter | Harness | Description |
|---|---|---|
bare-agent | Built-in | Minimal agent loop with 5 tools. Primary adapter for profiling and testing. |
opencode | OpenCode CLI | Wraps OpenCode, parses NDJSON event stream. |
openclaw | OpenClaw CLI | Wraps OpenClaw, manages temporary agent instances. |
hermes | Hermes CLI | Wraps Hermes and preserves full token and cost usage metadata. |
jiuwenclaw | Jiuwenclaw CLI | Wraps jiuwenclaw-cli over JSON-RPC; token and cost are not persisted upstream. |
LLM Providers
Two LLM provider implementations are included:
- Anthropic — native Anthropic SDK, supports
tool_use - OpenRouter — OpenAI-compatible API, routes to 100+ models
The extractStructured() function provides two-layer structured output: tool_use when available, prompt + parse fallback otherwise.
Evaluation Framework
Four evaluation methods, used by both the profiler and bench subsystems:
| Method | Mechanism |
|---|---|
script | Shell script exit code (0 = pass) |
file-check | Check file contents: exact match, contains, regex, JSON schema |
llm-judge | LLM evaluator with rubric, scores 0–1 |
custom | Registered evaluator functions |
Environment Variables
Set these in a .env file at the project root. Bun auto-loads it.
| Variable | Purpose |
|---|---|
OPENROUTER_API_KEY | API key for OpenRouter (agent/profiler) |
ANTHROPIC_API_KEY | API key for Anthropic (compiler backend) |
Provider Routing
Start with skvm config init. The interactive wizard writes $SKVM_CACHE/skvm.config.json (default ~/.skvm/skvm.config.json) and lets you configure providers, API keys, and adapter checkouts without editing JSON by hand.
skvm config init skvm config show skvm config doctor
Every CLI model field uses <provider>/<model-id>. In the docs, the default route uses openrouter/, so a Qwen target is written as openrouter/qwen/qwen3.5-35b-a3b and Claude through OpenRouter is written as openrouter/anthropic/claude-sonnet-4.6. If you configure Anthropic directly as the provider, the same Claude model becomes anthropic/claude-sonnet-4.6. In other words, anthropic is the provider in the native route, but part of the model id when the provider is openrouter.
{
"providers": {
"routes": [
{ "match": "anthropic/*", "kind": "anthropic", "apiKeyEnv": "ANTHROPIC_API_KEY" },
{ "match": "openai/*", "kind": "openai-compatible", "apiKeyEnv": "OPENAI_API_KEY", "baseUrl": "https://api.openai.com/v1" },
{ "match": "openrouter/*", "kind": "openrouter", "apiKeyEnv": "OPENROUTER_API_KEY" }
]
}
}
providers.routes is matched top to bottom and the first glob wins. SkVM strips the first path segment before sending the model id to the backend SDK. For openai-compatible routes, baseUrl is required. Unprefixed model ids do not auto-route and will fail to match.
Model IDs
SkVM uses provider-prefixed model identifiers in the form <provider>/<model-id>. The default examples below use OpenRouter, and the final line shows the native Anthropic form for comparison:
openrouter/qwen/qwen3.5-35b-a3b openrouter/anthropic/claude-sonnet-4.6 openrouter/google/gemini-2.5-flash openrouter/qwen/qwen3-30b-a3b-instruct-2507 openrouter/deepseek/deepseek-chat-v3-0324 anthropic/claude-sonnet-4.6
Data Directory
Current SkVM uses two roots: skvm-data/ for the input dataset, and .skvm/ for local runtime cache and generated artifacts.
skvm-data/ ├── skills/ # input skill definitions └── tasks/ # benchmark task dataset .skvm/ ├── profiles/ # cached TCP JSON files ├── log/ # profile / compile / bench / runtime logs └── proposals/ # aot-compile, jit-boost, jit-optimize outputs
The legacy flat data/ submodule has been retired. To update the dataset, commit and push inside skvm-data/, then run git add skvm-data && git commit in the main repo to update the submodule pointer.