Documentation

SkVM is a compilation and runtime system that makes LLM agent skills portable across heterogeneous models and harnesses. It implements the SkVM paper: profile model capabilities, compile skills to match, and optimize execution at runtime.

Tip

New to SkVM? Start with the Quick Start guide to profile a model, compile a skill, and measure the results in under 5 minutes.

Installation

Most users should install the standalone skvm CLI and use it directly. Source checkout is still supported for contributors, but the installed binary is the primary workflow described in the current README.

terminal

# One-line installer (macOS / Linux)
curl -fsSL https://skillvm.ai/install.sh | sh

# Or install from npm (Node 18+)
npm i -g @skillvm/skvm

# Set your API key and verify the install
export OPENROUTER_API_KEY=sk-or-...
skvm --help

Note

The installer places the standalone binary under ~/.local/share/skvm/bin/skvm, symlinks it into ~/.local/bin/skvm, and bundles a private opencode copy used by skvm jit-optimize. If you are developing from source, clone with --recurse-submodules so skvm-data/ is available.

Bundled Skills

Agent-facing helper skills ship inside the install. Copy skvm-jit and skvm-general from ~/.local/share/skvm/skills/ into your agent harness skill directory when you want the harness to drive profiling, AOT compilation, proposal review, or post-task log submission.

Quick Start

The current CLI quick start has four common workflows: profile the target, AOT-compile the skill, autotune with synthetic tasks, or optimize from an existing conversation log.

For every SkVM command below, model fields use the fully qualified form <provider>/<model-id>. For example, OpenRouter targets are written like openrouter/qwen/qwen3.5-35b-a3b.

1. Profile the target model

Generate a Target Capability Profile (TCP) for the target model and harness pair by running the 26-primitive profiling suite:

terminal

skvm profile
  --model=openrouter/qwen/qwen3.5-35b-a3b
  --adapter=bare-agent

The resulting profile is cached under .skvm/profiles/ and reused by later compilation runs for the same model and adapter combination.

2. AOT-compile the skill

Run the AOT compiler on the skill directory for the profiled target. The example below uses Pass 1 only and explicitly selects the compiler model:

terminal

skvm aot-compile
  --skill=path/to/skill-dir
  --model=openrouter/qwen/qwen3.5-35b-a3b
  --adapter=bare-agent
  --pass=1
  --compiler-model=openrouter/anthropic/claude-sonnet-4.6

AOT outputs are written into the proposals tree under .skvm/proposals/aot-compile/, where you can inspect the compiled variant before adopting it.

3. Autotune with synthetic tasks

Let SkVM generate synthetic tasks from the skill itself, then iterate optimize → rerun → score against the specified target model and adapter:

terminal

skvm jit-optimize --skill=path/to/skill-dir --task-source=synthetic
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6
  --target-model=openrouter/qwen/qwen3.5-35b-a3b
  --target-adapter=bare-agent

4. Or optimize from an existing conversation log

For post-mortems and feedback loops, feed prior run logs directly into the optimizer without rerunning tasks:

terminal

skvm jit-optimize --skill=path/to/skill-dir --task-source=log
  --logs=path/to/session.jsonl
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6
  --target-model=openrouter/qwen/qwen3.5-35b-a3b

CLI Reference

CLI: profile

Profile a model's capabilities against the 26-primitive catalog. The command writes a cached TCP per (model, adapter), so later aot-compile and pipeline runs can reuse it instead of re-running microbenchmarks.

usage

skvm profile --model=<provider>/<model-id> [options]

Detailed parameter guide. profile is the entry point that decides which model-adapter pairs need work, whether cached TCPs may be reused, and how profiling slots are distributed across the resulting job matrix.

Flag	Description	Default
`--model`	Required unless `--batch` is used. Accepts one or more provider-prefixed model IDs in the form `<provider>/<model-id>`. Each model is combined with each selected adapter to form a separate profiling job.	required
`--adapter`	Selects which harness implementation to profile: `bare-agent`, `opencode`, `openclaw`, `hermes`, or `jiuwenclaw`. Comma-separated values profile multiple harnesses against the same model set.	`bare-agent`
`--primitives`	Restricts profiling to a comma-separated subset of primitive IDs. Use this when iterating on a small slice of the primitive catalog instead of paying for the full 26-capability sweep.	all registered primitives
`--skip`	Explicitly excludes primitive IDs from the run after primitive selection is resolved. Useful for temporarily suppressing unstable or already-known primitives.	none
`--instances`	Controls how many randomized instances are generated per difficulty level. Higher values reduce noise but increase cost and elapsed time.	`3`
`--force`	Ignores cached TCPs and forces a fresh profile. This is the flag to use when the model behavior, adapter behavior, or primitive implementation changed and the old TCP is no longer trustworthy.	`false`
`--list`	Skips execution entirely and prints cached profiles already available on disk. It is the cheapest way to check whether a required TCP already exists.	off
`--batch`	Builds the model set from the benchmark configuration instead of requiring `--model`. In batch mode, the adapter default broadens to all registered adapters rather than only `bare-agent`.	`false`
`--concurrency`	Sets the total profiling slot budget across all model-adapter combinations. The scheduler distributes slots hierarchically per adapter and then per model, so this controls overall throughput rather than per-primitive parallelism in isolation.	`1`
`--verbose`	Enables debug logging so you can inspect primitive scheduling, adapter setup, and error details while profiling is running.	`false`

Examples

terminal

# Default OpenRouter route
skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b

# Profile multiple models in parallel
skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b,openrouter/deepseek/deepseek-chat-v3-0324 --concurrency=4

# Native Anthropic route
skvm profile --model=anthropic/claude-sonnet-4.6

# Profile multiple adapters for one model
skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b --adapter=bare-agent,opencode

# List all cached profiles
skvm profile --list

CLI: aot-compile

AOT-compile one or more skills for one or more target model-adapter pairs. The compiler consumes an existing TCP, runs the selected passes, validates the result with guard checks, and writes compiled variants under ~/.skvm/proposals/aot-compile/.

usage

skvm aot-compile --skill=<path> --model=<provider>/<model-id> [options]

Detailed parameter guide. aot-compile first resolves skill paths, then loads TCPs for every requested model × adapter pair, and finally runs a shared compiler provider across the resulting job matrix.

Flag	Description	Default
`--skill`	Required. Accepts one or more skill directories or `SKILL.md` paths. Every resolved skill is compiled against every requested model and adapter.	required
`--model`	Required. Accepts one or more target model IDs in the form `<provider>/<model-id>`. Each model must already have a cached or explicitly supplied TCP for the selected adapter.	required
`--adapter`	Selects one or more harnesses whose TCPs should be used during compilation. This matters because capability profiles are stored per `(model, adapter)`, not just per model.	`bare-agent`
`--profile`	Overrides cache lookup with a specific TCP JSON file. This is supported only for a single model plus single adapter job and is mainly useful for testing or reproducing a particular profile snapshot.	load from cache
`--pass`	Selects which compiler passes to run. Any subset such as `1`, `1,2`, or `1,3` is legal, and the chosen pass set is reflected in the output directory pass tag.	`1,2,3`
`--concurrency`	Controls how many compile jobs run in parallel across the skill-model-adapter job matrix. It affects total throughput, not the internal parallelism extracted by Pass 3.	`1`
`--dry-run`	Runs the compiler pipeline and prints the result summary without writing a compiled variant to disk. Use this to inspect gaps, transforms, and guard status before publishing an artifact.	`false`
`--compiler-model`	Overrides the LLM backend used for the LLM-backed parts of compilation, such as SCR extraction, agentic rewriting, dependency extraction, and workflow decomposition.	`openrouter/anthropic/claude-sonnet-4.6`

Compilation Passes

The three-pass compiler is sequential on purpose: each pass narrows uncertainty and emits artifacts that the next pass can trust. Guard validation runs after compilation so unsafe or internally inconsistent rewrites are caught before the variant is accepted.

Pass 1 — Capability Gap Analysis: extracts the SCR from SKILL.md, compares the required primitive levels against the TCP, and splits deficits into hard absence vs weak proficiency. Hard gaps trigger substitution onto alternative primitive paths; weak gaps trigger compensation such as extra scaffolding, examples, decomposition, or stronger execution constraints. SCR extraction and rewriting are LLM-backed, while the actual gap analysis is deterministic computation.
Pass 2 — Environment Binding: inspects the skill bundle for tools, binaries, packages, and API dependencies, then checks whether the current environment already satisfies them. The output is not just a list of dependencies: SkVM also emits an idempotent environment setup script so the compiled artifact carries enough operational context to bootstrap itself on a clean machine.
Pass 3 — Concurrency Extraction: decomposes the workflow into a DAG, identifies independent stages, and records parallelism opportunities as DLP, ILP, and TLP hints. In practice this means the compiler is trying to separate steps that can be run independently from steps that are only ordered because the original prose was linear.

Why Pass Order Matters

Pass 1 changes what the skill asks the model to do, Pass 2 changes what the environment must provide, and Pass 3 changes how the remaining work can be scheduled. Reordering them would make later analysis operate on stale assumptions.

CLI: jit-optimize

Proposal-based skill optimization supports three explicit evidence sources: synthetic, real, and log. Regardless of source, the optimizer writes a proposal tree, records per-round evidence and root-cause analysis, and picks a best round for later review or deployment.

usage

# Synthetic autotune
skvm jit-optimize --skill=path/to/skill-dir --task-source=synthetic
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6
  --target-model=openrouter/qwen/qwen3.5-35b-a3b --rounds=3

# Real bench tasks
skvm jit-optimize --skill=path/to/skill-dir --task-source=real
  --tasks=task-a,task-b --test-tasks=task-c
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b

# Existing execution logs
skvm jit-optimize --skill=path/to/skill-dir --task-source=log
  --logs=path/to/log1.jsonl,path/to/log2.jsonl
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b

Detailed parameter guide. jit-optimize always builds a proposal keyed by (harness, target-model, skill-name). What changes across task sources is how evidence is collected and whether tasks are re-executed or only analyzed retrospectively.

Shared Required Flags

Flag	Description	Default
`--skill`	Path to the skill directory being optimized. In batch mode this is replaced by `--skill-list`.	required
`--task-source`	Explicitly chooses the evidence source: `synthetic`, `real`, or `log`. The CLI does not infer this from the other flags.	required
`--optimizer-model`	The model that edits the skill based on accumulated evidence. This is separate from the target model being optimized for.	required
`--target-model`	Required for every source. For `synthetic` and `real` it is the model that reruns tasks; for `log` it is still required because it determines proposal storage location.	required
`--target-adapter`	Harness paired with the target model. In log mode this is informational, but in rerun modes it determines which adapter actually executes the evaluation loop.	`bare-agent`

Task-Source-Specific Flags

Source	Flags	Meaning
`synthetic`	`--synthetic-count`, `--synthetic-test-count`	Controls how many train and held-out test tasks the optimizer should synthesize directly from the skill description before the loop begins.
`real`	`--tasks`, `--test-tasks`	Uses explicit benchmark tasks as evidence. If `--test-tasks` is omitted, the training set is reused as the evaluation set, which weakens holdout protection.
`log`	`--logs`, `--failures`	Consumes existing conversation logs and optional structured failure JSON files. No tasks are rerun in this mode.

Loop, Delivery, and Batch Flags

Flag	Description	Default
`--rounds`	Maximum number of optimization rounds after the baseline round. Round 0 is always the starting skill snapshot; later rounds iterate edit → rerun → score.	`3` for synthetic/real, `1` for log
`--runs-per-task`	Number of executions per task per round in rerun modes. Raised above 1 by default to make best-round selection less sensitive to single-run noise.	`2`
`--task-concurrency`	Maximum in-flight task runs across train and test sets in a round.	`1`
`--convergence`	Early-exit threshold on the primary score. When a round meets or exceeds this score, the loop can stop before consuming all remaining rounds.	`0.95`
`--baseline`	Also evaluates no-skill and original-skill baselines for comparison. This is forbidden in log mode because log mode does not rerun tasks.	`false`
`--no-keep-all-rounds`	Prunes proposal storage so only the chosen best round is retained instead of keeping every intermediate round directory.	`false`
`--auto-apply`	Deploys the best round back onto the original skill directory immediately after selection.	`false`
`--skill-list`	Runs batch optimization over one skill path per line.	off
`--concurrency`	Batch-job parallelism when multiple skills are optimized in the same invocation.	`1`

Source Compatibility Rules

SkVM validates task-source-specific flags strictly. For example, --tasks is only valid for real, --logs is only valid for log, and loop-control flags like --baseline are rejected for log because there is no rerun phase.

CLI: pipeline

Profile the target if no cached TCP exists, then run aot-compile with that TCP. This is the shortest path when you want a compiled variant but do not want to manually split the work into profile and compile steps.

usage

skvm pipeline --skill=<path> --model=<provider>/<model-id> [options]

Flag	Description	Default
`--skill`	Required. Skill directory or `SKILL.md` path that should be compiled.	required
`--model`	Required. Target model ID in the form `<provider>/<model-id>`, used both for cache lookup and eventual compilation.	required
`--adapter`	Harness whose TCP should be used. The same adapter is used for auto-profiling when no cached profile exists.	`bare-agent`
`--force-profile`	Forces a fresh profiling run instead of reusing a cached TCP. Use it when you suspect the cache is stale but still want the convenience of the one-command pipeline.	`false`
`--profile`	Supplies a specific TCP file and skips auto-profiling. This is the escape hatch for deterministic reproduction or external TCP inspection.	auto-load or auto-profile
`--pass`	Selects which AOT passes to run after the TCP is available.	`1,2,3`
`--compiler-model`	Overrides the compiler LLM used during the compilation stage.	`openrouter/anthropic/claude-sonnet-4.6`
`--dry-run`	Prints the resulting compile summary without writing the output variant.	`false`

Operationally, pipeline has three branches: load an explicit profile, reuse a cached profile, or run profile inline. Only after the TCP is resolved does it move on to the compile stage.

CLI: proposals

Inspect, diff, group, serve, accept, or reject the proposal trees created by jit-optimize. This command is the review surface between optimization and deployment.

usage

skvm proposals list | show | diff | report | serve | accept | reject [options]

Subcommand / Flag	Description
`list`	Lists proposals and supports filtering by `--harness`, `--target-model`, `--skill`, and `--status`, plus sorting and grouping for review workflows.
`show <id>`	Prints proposal metadata, per-round summary, and optionally full analysis content with `--full`.
`diff <id> [--round=N]`	Prints a unified diff between the original skill and a chosen round. If `--round` is omitted, the best round is used.
`report`	Generates an HTML report for the filtered proposal set. `--out` overrides the output path.
`serve`	Starts the local proposal review server. `--port`, `--host`, and `--no-open` control how the server is exposed.
`accept <id>`	Deploys the best round or the round chosen by `--round`. `--target` overrides the deployment directory.
`reject <id>`	Marks the proposal as rejected without deploying anything.
`--sort`, `--min-delta`, `--group-by`, `--no-color`	Formatting and review controls for list/report-style outputs.

Use proposals as the human review gate. jit-optimize is allowed to propose edits, but accept is the point where those edits become the live skill bundle.

CLI: run

Execute one task against one model+adapter, with or without a skill. This is execute-only and primarily for testing; use bench when you need scored evaluation.

usage

skvm run --task=<path> --model=<provider>/<model-id> --adapter=<name> [--skill=<path>]

Flag	Description	Default
`--task`	Required. Path to a task JSON file using the bench task schema.	required
`--model`	Required. Provider-prefixed model identifier in the form `<provider>/<model-id>`, passed through to the chosen adapter.	required
`--skill`	Optional skill to inject for the run. Omit it to execute the task without any skill assistance.	none
`--adapter`	Harness used to execute the task.	`bare-agent`
`--workdir`	Reuses a specific working directory instead of creating a temporary one. Useful for reproducing runs and inspecting artifacts across iterations.	temporary directory
`--timeoutMs`	Overrides the task timeout defined in the task file.	task-defined timeout
`--maxSteps`	Overrides the task or adapter step budget for the run.	task-defined max steps
`--verbose`	Enables more detailed execution logging.	`false`

SkVM copies any files under the task's fixtures/ directory into the work directory before execution, then reports workdir, timing, token usage, and non-OK run status at the end.

CLI: bench

Run benchmark conditions over tasks, skills, and models. bench is the widest CLI surface in SkVM because it covers standard benchmarking, deferred judging, session resumption, task import, and condition-to-condition comparison.

usage

skvm bench --model=<provider>/<model-id> [options]

Detailed parameter guide. The standard execution path builds a benchmark plan from model × adapter × task × condition, while submodes like judge, --compare, --import, and --custom bypass parts of that plan builder.

Flag	Description	Default
`--model`	Required for normal benchmarking unless you are resuming a session that already records the model. Use provider-prefixed IDs in the form `<provider>/<model-id>`; comma-separated values enable multi-model mode.	required
`--adapter`	Selects one or more harnesses. Multi-adapter mode is supported, but cannot be combined with multi-model mode in the same invocation.	`bare-agent`
`--tasks`	Restricts benchmarking to a comma-separated task subset instead of the full task pool.	all tasks
`--source`	Filters tasks by origin source, such as `pinchbench`, `skillsbench`, or other importer-specific labels stored in the task metadata.	all sources
`--conditions`	Selects which skill conditions to evaluate, including `no-skill`, `original`, `aot-compiled`, pass-specific AOT variants like `aot-compiled-p12`, `jit-boost`, and `jit-optimized`.	all standard conditions
`--custom`	Runs a YAML-defined custom benchmark plan with explicit nested task-skill-model-adapter mappings. This bypasses the standard condition system entirely.	off
`--skill-mode`	Controls whether skills are directly injected or discovered by the harness. This matters when the harness has its own skill loading semantics.	`inject`
`--jit-runs`	Warm-up repetitions used by the `jit-boost` condition before measuring the solidified path. Higher values give the boost mechanism more chances to promote repeated patterns.	`3`
`--timeout-mult`	Multiplies per-task timeout budgets. Use it when a model or adapter is known to be slower than the default task envelopes assume.	`1.0`
`--max-steps`	Overrides the maximum number of agent steps each run may take before the harness is stopped.	`30`
`--judge-model`	Selects the LLM used by `llm-judge` criteria. This only affects judging, not the model being benchmarked.	`openrouter/anthropic/claude-sonnet-4.6`
`--compiler-model`	Overrides the compiler model used when a requested benchmark condition needs to materialize an AOT variant during the run.	`openrouter/anthropic/claude-sonnet-4.6`
`--profile`	Supplies a TCP path for AOT conditions. Without it, AOT conditions are skipped when the needed TCP cannot be resolved from cache.	auto-load if available
`--resume`	Resumes an interrupted session by id or with `latest`. This preserves progress instead of re-running already completed work.	off
`--list-sessions`	Prints known benchmark sessions and their statuses without running anything.	off
`--concurrency`	Controls total parallel task execution. In multi-model mode the slots are distributed across models rather than spent inside a single model run.	`1`
`--runs-per-task`	Repeats each task-condition pair multiple times and averages the result, reducing noise from stochastic decoding or unstable harness behavior.	`1`
`--keep-workdirs`	Keeps task working directories after completion so failures can be inspected manually.	`false`
`--verbose`	Enables debug logging during orchestration and execution.	`false`

Async Judge and Compare Modes

Flag / Subcommand	What It Does
`--async-judge`	Defers `llm-judge` criteria into a post-run batch. This is useful when you want the expensive LLM judging stage decoupled from the main benchmark execution.
`bench judge --manifest=<dir>`	Runs the deferred judge pass later from the generated manifest directory.
`--merge-judge=<results-dir>`	Merges post-run judging results back into an existing report.
`--compare`	Switches `bench` into artifact comparison mode instead of execution mode.
`--skill-path`, `--lhs`, `--rhs`, `--output-dir`	Required inputs for compare mode: the skill to inspect, the two conditions to compare, and where to write the generated diff/report outputs.
`--analyze-model`	Optional summarization model used to generate a higher-level explanation of the skill difference during compare mode.

Import Mode

Flag	What It Does
`--import`	Runs task importers instead of benchmarking. Current sources are `pinchbench` and `skillsbench`.
`--path`	Overrides the source directory used by the importer.
`--dry-run`	Shows what would be imported without writing files.

examples

# Single model benchmark
        skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --adapter=bare-agent

# Specific conditions and tasks
        skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --conditions=no-skill,original,aot-compiled,jit-boost

# Defer LLM-judge and process it later
        skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --async-judge
skvm bench judge --manifest=path/to/manifest-dir --judge-model=openrouter/anthropic/claude-sonnet-4.6

CLI: clean-jit

Clear persisted JIT artifacts for a model+adapter pair when you want to reset solidification or proposal state.

usage

skvm clean-jit --model=<provider>/<model-id> --adapter=<name>

Flag	Description	Default
`--model`	Required. Provider-prefixed model whose runtime JIT state should be cleared.	required
`--adapter`	Required. Adapter whose runtime artifacts should be cleaned alongside the model key.	required
`--dry-run`	Prints the deletion plan, including runtime directories and matching `solidification-state.json` files, without removing anything.	`false`
`--yes`	Confirms destructive cleanup. Required unless `--dry-run` is active.	`false`
`--include-bench-logs`	Also deletes matching benchmark session directories in addition to runtime JIT state.	`false`

The command intentionally keeps compiled SKILL.md artifacts, candidate metadata, and cached profiles intact. It is for resetting JIT effects, not wiping every derived artifact.

CLI: logs

List recent runs across profiling, compilation, bench, and runtime subsystems from the shared cache tree.

usage

skvm logs

Flag	Description	Default
`--type`	Filters sessions by subsystem, such as `profile`, `aot-compile`, `bench`, `run`, or `pipeline`.	all session types
`--limit`	Limits how many recent entries are shown.	`20`
`--all`	Disables the limit and prints the full session index.	`false`

Each entry includes status, type, model or model count, harness, skill, summary text, and the log directory path, so logs acts as a lightweight session index over the shared cache.

Compilation Feature

Primitive Catalog

SkVM defines 26 primitive capabilities that describe what an LLM agent can do. Each primitive is testable at three difficulty levels (L1–L3). The catalog is organized into four domains:

Domain	Prefix	Examples
Code Generation	`gen.code.*`	write, edit, debug, test, refactor
Tool Use	`tool.*`	file.read, file.write, exec, web_fetch
Reasoning	`reason.*`	plan, decompose, diagnose, analyze
Instruction Following	`follow.*`	format, constraint, multi-step, edge-case

Each primitive has a dedicated microbenchmark generator that produces randomized test instances. Generators use two evaluation patterns:

Tool-use primitives — agent runs tools, evaluator checks files in the working directory
Text-only primitives — profiler writes LLM response to file, evaluator reads it

TCP — Target Capability Profile

A TCP is the output of profiling: a JSON file that maps each of the 26 primitives to a proficiency level.

Level	Meaning
`L3`	Full proficiency — handles complex instances
`L2`	Moderate proficiency — handles standard instances
`L1`	Basic proficiency — handles simple instances
`L0`	No proficiency — fails even simple instances

The profiler uses progressive testing: it tests L3 first. If the model passes, L2 and L1 are skipped (assumed passed). This minimizes API costs while maintaining accuracy.

.skvm/profiles/bare-agent/qwen-qwen3-30b-a3b-instruct-2507.json (excerpt)

{
  "gen.code.write": { "level": 2, "scores": { "L3": 0.33, "L2": 0.83 } },
  "tool.file.read": { "level": 3, "scores": { "L3": 1.0 } },
  "reason.plan":    { "level": 1, "scores": { "L3": 0.0, "L2": 0.17, "L1": 0.67 } }
}

SCR — Skill Capability Requirement

An SCR describes what primitives a skill needs and at what proficiency level. It is extracted automatically by the compiler (Pass 1) from the skill's SKILL.md file.

The SCR may include alternative implementation paths — different ways to accomplish the same goal using different primitives. This allows the compiler to find substitutions when a model lacks a required capability.

Key Insight

The gap between a model's TCP and a skill's SCR determines what compilation transforms are needed. No gap means no transformation — the skill runs as-is.

3-Pass Compilation

The AOT compiler transforms skills in three sequential passes. Each pass emits artifacts that narrow the next stage's search space, so the compiler is not just editing prose: it is moving from capability analysis to environment binding to executable scheduling hints.

Pass 1: Capability Gap Analysis

Pass 1 reads the skill as a requirement document. It extracts the SCR, maps each purpose to the primitives and minimum levels it needs, compares those against the target TCP, and decides whether the skill can run unchanged, needs compensation, or needs structural substitution.

L0 gaps — capability absent → substitution (replace with alternative primitives)
Weak gaps — capability present but below required level → compensation (add scaffolding, examples, decomposition)

Because the SCR can contain alternative implementation paths, Pass 1 is not limited to saying “the model is weaker.” It can often redirect the skill onto a different primitive path that the target model is better at executing.

Pass 2: Environment Binding

Pass 2 binds the rewritten skill to the actual machine it will run on. It extracts dependency manifests, checks whether binaries, Python packages, APIs, or shell tools already exist, and emits an idempotent setup script that can recreate the required environment.

This means AOT output is not only “better instructions for the model”; it also becomes a more operational skill bundle with explicit setup knowledge instead of hidden environmental assumptions.

Pass 3: Concurrency Extraction

Pass 3 analyzes the workflow structure itself. It decomposes the skill into a DAG, identifies true data and control dependencies, and looks for places where the original sequential prose can be turned into parallel work without changing semantics.

DLP — Data-Level Parallelism (process independent data chunks)
ILP — Instruction-Level Parallelism (pipeline independent steps)
TLP — Task-Level Parallelism (run independent subtasks concurrently)

The resulting variant can therefore preserve the same end behavior while exposing runtime scheduling opportunities that a plain natural-language skill would usually leave implicit.

JIT Optimization

SkVM provides two independent JIT systems that operate after the original skill already exists. They solve different problems: JIT-boost reduces repeated runtime cost by bypassing predictable LLM calls, while JIT-optimize edits the skill itself based on evidence collected from runs.

JIT-Boost (Code Solidification)

JIT-boost is a runtime specialization layer. It first uses a headless agent to scan the whole skill bundle and emit boost-candidates.json entries containing code signatures, keywords, parameter templates, and execution templates. During execution, runtime hooks watch LLM calls for repeated matches against those signatures.

After enough consecutive matches, a candidate is promoted. From that point on, SkVM extracts parameters directly from the prompt, executes the stored template, and bypasses the LLM entirely for that repeated pattern.

Zero LLM calls at runtime for promoted patterns
Automatic demotion on failure (falls back to LLM)
Model and harness agnostic — stored per skill

So JIT-boost is closer to code caching or solidification than to skill rewriting: it accelerates a stable repeated behavior path without changing the skill source itself.

JIT-Optimize (Skill Rewriting)

JIT-optimize is a proposal-driven editing loop. It normalizes evidence into a shared schema, copies the skill into a temporary workspace, writes the evidence and history into .optimize/, and launches a headless agent that can edit SKILL.md or any bundle files.

The optimizer must submit a structured record containing a required rootCause, reasoning, confidence, and changed files. SkVM then snapshots the workspace as a numbered round, computes the actual diff, reruns evaluation when appropriate, and chooses the best round according to score and monotonicity checks.

Because the result is stored as a proposal tree rather than applied immediately, JIT-optimize is a controlled editing workflow, not an in-place mutation mechanism.

Autotune

Autotune is the fully closed loop built on top of jit-optimize --task-source=synthetic: generate evaluation tasks, execute them, score the result, propose edits, and repeat until rounds are exhausted or convergence is reached. It is the nearest thing SkVM has to self-supervised online skill improvement.

Architecture

System Overview

SkVM is structured as a modular pipeline:

data flow

Profile Tool ──TCP──> AOT Compiler ──Variant──> Runtime + Agent
     │                    │                         │
  26 primitives    3 passes                JIT-boost + JIT-optimize
  L3→L1 progressive   1: capability gaps     - code solidification
                       2: env binding         - skill optimization
                       3: concurrency DAG     - autotune loop

Key design principles:

Decoupled stages — profiler, compiler, and runtime can run independently
Pluggable adapters — any agent harness can be integrated via the AgentAdapter interface
Pluggable providers — any LLM backend via the LLMProvider interface
Cached artifacts — profiles, logs, and proposal-tree outputs are persisted and reused

Agent Adapters

Adapters wrap agent harnesses and provide a uniform interface. All adapters support RuntimeHooks for JIT monitoring.

Adapter	Harness	Description
`bare-agent`	Built-in	Minimal agent loop with 5 tools. Primary adapter for profiling and testing.
`opencode`	OpenCode CLI	Wraps OpenCode, parses NDJSON event stream.
`openclaw`	OpenClaw CLI	Wraps OpenClaw, manages temporary agent instances.
`hermes`	Hermes CLI	Wraps Hermes and preserves full token and cost usage metadata.
`jiuwenclaw`	Jiuwenclaw CLI	Wraps `jiuwenclaw-cli` over JSON-RPC; token and cost are not persisted upstream.

LLM Providers

Two LLM provider implementations are included:

Anthropic — native Anthropic SDK, supports tool_use
OpenRouter — OpenAI-compatible API, routes to 100+ models

The extractStructured() function provides two-layer structured output: tool_use when available, prompt + parse fallback otherwise.

Evaluation Framework

Four evaluation methods, used by both the profiler and bench subsystems:

Method	Mechanism
`script`	Shell script exit code (0 = pass)
`file-check`	Check file contents: exact match, contains, regex, JSON schema
`llm-judge`	LLM evaluator with rubric, scores 0–1
`custom`	Registered evaluator functions

Configuration

Environment Variables

Set these in a .env file at the project root. Bun auto-loads it.

Variable	Purpose
`OPENROUTER_API_KEY`	API key for OpenRouter (agent/profiler)
`ANTHROPIC_API_KEY`	API key for Anthropic (compiler backend)

Provider Routing

Start with skvm config init. The interactive wizard writes $SKVM_CACHE/skvm.config.json (default ~/.skvm/skvm.config.json) and lets you configure providers, API keys, and adapter checkouts without editing JSON by hand.

terminal

skvm config init
skvm config show
skvm config doctor

Every CLI model field uses <provider>/<model-id>. In the docs, the default route uses openrouter/, so a Qwen target is written as openrouter/qwen/qwen3.5-35b-a3b and Claude through OpenRouter is written as openrouter/anthropic/claude-sonnet-4.6. If you configure Anthropic directly as the provider, the same Claude model becomes anthropic/claude-sonnet-4.6. In other words, anthropic is the provider in the native route, but part of the model id when the provider is openrouter.

skvm.config.json

{
  "providers": {
    "routes": [
      { "match": "anthropic/*",  "kind": "anthropic",         "apiKeyEnv": "ANTHROPIC_API_KEY" },
      { "match": "openai/*",     "kind": "openai-compatible", "apiKeyEnv": "OPENAI_API_KEY",     "baseUrl": "https://api.openai.com/v1" },
      { "match": "openrouter/*", "kind": "openrouter",        "apiKeyEnv": "OPENROUTER_API_KEY" }
    ]
  }
}

Routing Rules

providers.routes is matched top to bottom and the first glob wins. SkVM strips the first path segment before sending the model id to the backend SDK. For openai-compatible routes, baseUrl is required. Unprefixed model ids do not auto-route and will fail to match.

Model IDs

SkVM uses provider-prefixed model identifiers in the form <provider>/<model-id>. The default examples below use OpenRouter, and the final line shows the native Anthropic form for comparison:

examples

openrouter/qwen/qwen3.5-35b-a3b
openrouter/anthropic/claude-sonnet-4.6
openrouter/google/gemini-2.5-flash
openrouter/qwen/qwen3-30b-a3b-instruct-2507
openrouter/deepseek/deepseek-chat-v3-0324
anthropic/claude-sonnet-4.6

Data Directory

Current SkVM uses two roots: skvm-data/ for the input dataset, and .skvm/ for local runtime cache and generated artifacts.

layout

skvm-data/
├── skills/         # input skill definitions
└── tasks/          # benchmark task dataset

.skvm/
├── profiles/       # cached TCP JSON files
├── log/            # profile / compile / bench / runtime logs
└── proposals/      # aot-compile, jit-boost, jit-optimize outputs

Important

The legacy flat data/ submodule has been retired. To update the dataset, commit and push inside skvm-data/, then run git add skvm-data && git commit in the main repo to update the submodule pointer.