Documentation

SkVM is a compilation and runtime system that makes LLM agent skills portable across heterogeneous models and harnesses. It implements the SkVM paper: profile model capabilities, compile skills to match, and optimize execution at runtime.

Tip

New to SkVM? Start with the Quick Start guide to profile a model, compile a skill, and measure the results in under 5 minutes.

Getting Started

Installation

Most users should install the standalone skvm CLI and use it directly. Source checkout is still supported for contributors, but the installed binary is the primary workflow described in the current README.

terminal
# One-line installer (macOS / Linux)
curl -fsSL https://skillvm.ai/install.sh | sh

# Or install from npm (Node 18+)
npm i -g @skillvm/skvm

# Set your API key and verify the install
export OPENROUTER_API_KEY=sk-or-...
skvm --help
Note

The installer places the standalone binary under ~/.local/share/skvm/bin/skvm, symlinks it into ~/.local/bin/skvm, and bundles a private opencode copy used by skvm jit-optimize. If you are developing from source, clone with --recurse-submodules so skvm-data/ is available.

Bundled Skills

Agent-facing helper skills ship inside the install. Copy skvm-jit and skvm-general from ~/.local/share/skvm/skills/ into your agent harness skill directory when you want the harness to drive profiling, AOT compilation, proposal review, or post-task log submission.

Quick Start

The current CLI quick start has four common workflows: profile the target, AOT-compile the skill, autotune with synthetic tasks, or optimize from an existing conversation log.

For every SkVM command below, model fields use the fully qualified form <provider>/<model-id>. For example, OpenRouter targets are written like openrouter/qwen/qwen3.5-35b-a3b.

1. Profile the target model

Generate a Target Capability Profile (TCP) for the target model and harness pair by running the 26-primitive profiling suite:

terminal
skvm profile
  --model=openrouter/qwen/qwen3.5-35b-a3b
  --adapter=bare-agent

The resulting profile is cached under .skvm/profiles/ and reused by later compilation runs for the same model and adapter combination.

2. AOT-compile the skill

Run the AOT compiler on the skill directory for the profiled target. The example below uses Pass 1 only and explicitly selects the compiler model:

terminal
skvm aot-compile
  --skill=path/to/skill-dir
  --model=openrouter/qwen/qwen3.5-35b-a3b
  --adapter=bare-agent
  --pass=1
  --compiler-model=openrouter/anthropic/claude-sonnet-4.6

AOT outputs are written into the proposals tree under .skvm/proposals/aot-compile/, where you can inspect the compiled variant before adopting it.

3. Autotune with synthetic tasks

Let SkVM generate synthetic tasks from the skill itself, then iterate optimize → rerun → score against the specified target model and adapter:

terminal
skvm jit-optimize --skill=path/to/skill-dir --task-source=synthetic
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6
  --target-model=openrouter/qwen/qwen3.5-35b-a3b
  --target-adapter=bare-agent

4. Or optimize from an existing conversation log

For post-mortems and feedback loops, feed prior run logs directly into the optimizer without rerunning tasks:

terminal
skvm jit-optimize --skill=path/to/skill-dir --task-source=log
  --logs=path/to/session.jsonl
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6
  --target-model=openrouter/qwen/qwen3.5-35b-a3b
CLI Reference

CLI: profile

Profile a model's capabilities against the 26-primitive catalog. The command writes a cached TCP per (model, adapter), so later aot-compile and pipeline runs can reuse it instead of re-running microbenchmarks.

usage
skvm profile --model=<provider>/<model-id> [options]

Detailed parameter guide. profile is the entry point that decides which model-adapter pairs need work, whether cached TCPs may be reused, and how profiling slots are distributed across the resulting job matrix.

FlagDescriptionDefault
--modelRequired unless --batch is used. Accepts one or more provider-prefixed model IDs in the form <provider>/<model-id>. Each model is combined with each selected adapter to form a separate profiling job.required
--adapterSelects which harness implementation to profile: bare-agent, opencode, openclaw, hermes, or jiuwenclaw. Comma-separated values profile multiple harnesses against the same model set.bare-agent
--primitivesRestricts profiling to a comma-separated subset of primitive IDs. Use this when iterating on a small slice of the primitive catalog instead of paying for the full 26-capability sweep.all registered primitives
--skipExplicitly excludes primitive IDs from the run after primitive selection is resolved. Useful for temporarily suppressing unstable or already-known primitives.none
--instancesControls how many randomized instances are generated per difficulty level. Higher values reduce noise but increase cost and elapsed time.3
--forceIgnores cached TCPs and forces a fresh profile. This is the flag to use when the model behavior, adapter behavior, or primitive implementation changed and the old TCP is no longer trustworthy.false
--listSkips execution entirely and prints cached profiles already available on disk. It is the cheapest way to check whether a required TCP already exists.off
--batchBuilds the model set from the benchmark configuration instead of requiring --model. In batch mode, the adapter default broadens to all registered adapters rather than only bare-agent.false
--concurrencySets the total profiling slot budget across all model-adapter combinations. The scheduler distributes slots hierarchically per adapter and then per model, so this controls overall throughput rather than per-primitive parallelism in isolation.1
--verboseEnables debug logging so you can inspect primitive scheduling, adapter setup, and error details while profiling is running.false

Examples

terminal
# Default OpenRouter route
skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b

# Profile multiple models in parallel
skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b,openrouter/deepseek/deepseek-chat-v3-0324 --concurrency=4

# Native Anthropic route
skvm profile --model=anthropic/claude-sonnet-4.6

# Profile multiple adapters for one model
skvm profile --model=openrouter/qwen/qwen3.5-35b-a3b --adapter=bare-agent,opencode

# List all cached profiles
skvm profile --list

CLI: aot-compile

AOT-compile one or more skills for one or more target model-adapter pairs. The compiler consumes an existing TCP, runs the selected passes, validates the result with guard checks, and writes compiled variants under ~/.skvm/proposals/aot-compile/.

usage
skvm aot-compile --skill=<path> --model=<provider>/<model-id> [options]

Detailed parameter guide. aot-compile first resolves skill paths, then loads TCPs for every requested model × adapter pair, and finally runs a shared compiler provider across the resulting job matrix.

FlagDescriptionDefault
--skillRequired. Accepts one or more skill directories or SKILL.md paths. Every resolved skill is compiled against every requested model and adapter.required
--modelRequired. Accepts one or more target model IDs in the form <provider>/<model-id>. Each model must already have a cached or explicitly supplied TCP for the selected adapter.required
--adapterSelects one or more harnesses whose TCPs should be used during compilation. This matters because capability profiles are stored per (model, adapter), not just per model.bare-agent
--profileOverrides cache lookup with a specific TCP JSON file. This is supported only for a single model plus single adapter job and is mainly useful for testing or reproducing a particular profile snapshot.load from cache
--passSelects which compiler passes to run. Any subset such as 1, 1,2, or 1,3 is legal, and the chosen pass set is reflected in the output directory pass tag.1,2,3
--concurrencyControls how many compile jobs run in parallel across the skill-model-adapter job matrix. It affects total throughput, not the internal parallelism extracted by Pass 3.1
--dry-runRuns the compiler pipeline and prints the result summary without writing a compiled variant to disk. Use this to inspect gaps, transforms, and guard status before publishing an artifact.false
--compiler-modelOverrides the LLM backend used for the LLM-backed parts of compilation, such as SCR extraction, agentic rewriting, dependency extraction, and workflow decomposition.openrouter/anthropic/claude-sonnet-4.6

Compilation Passes

The three-pass compiler is sequential on purpose: each pass narrows uncertainty and emits artifacts that the next pass can trust. Guard validation runs after compilation so unsafe or internally inconsistent rewrites are caught before the variant is accepted.

  1. Pass 1 — Capability Gap Analysis: extracts the SCR from SKILL.md, compares the required primitive levels against the TCP, and splits deficits into hard absence vs weak proficiency. Hard gaps trigger substitution onto alternative primitive paths; weak gaps trigger compensation such as extra scaffolding, examples, decomposition, or stronger execution constraints. SCR extraction and rewriting are LLM-backed, while the actual gap analysis is deterministic computation.
  2. Pass 2 — Environment Binding: inspects the skill bundle for tools, binaries, packages, and API dependencies, then checks whether the current environment already satisfies them. The output is not just a list of dependencies: SkVM also emits an idempotent environment setup script so the compiled artifact carries enough operational context to bootstrap itself on a clean machine.
  3. Pass 3 — Concurrency Extraction: decomposes the workflow into a DAG, identifies independent stages, and records parallelism opportunities as DLP, ILP, and TLP hints. In practice this means the compiler is trying to separate steps that can be run independently from steps that are only ordered because the original prose was linear.
Why Pass Order Matters

Pass 1 changes what the skill asks the model to do, Pass 2 changes what the environment must provide, and Pass 3 changes how the remaining work can be scheduled. Reordering them would make later analysis operate on stale assumptions.

CLI: jit-optimize

Proposal-based skill optimization supports three explicit evidence sources: synthetic, real, and log. Regardless of source, the optimizer writes a proposal tree, records per-round evidence and root-cause analysis, and picks a best round for later review or deployment.

usage
# Synthetic autotune
skvm jit-optimize --skill=path/to/skill-dir --task-source=synthetic
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6
  --target-model=openrouter/qwen/qwen3.5-35b-a3b --rounds=3

# Real bench tasks
skvm jit-optimize --skill=path/to/skill-dir --task-source=real
  --tasks=task-a,task-b --test-tasks=task-c
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b

# Existing execution logs
skvm jit-optimize --skill=path/to/skill-dir --task-source=log
  --logs=path/to/log1.jsonl,path/to/log2.jsonl
  --optimizer-model=openrouter/anthropic/claude-sonnet-4.6 --target-model=openrouter/qwen/qwen3.5-35b-a3b

Detailed parameter guide. jit-optimize always builds a proposal keyed by (harness, target-model, skill-name). What changes across task sources is how evidence is collected and whether tasks are re-executed or only analyzed retrospectively.

Shared Required Flags

FlagDescriptionDefault
--skillPath to the skill directory being optimized. In batch mode this is replaced by --skill-list.required
--task-sourceExplicitly chooses the evidence source: synthetic, real, or log. The CLI does not infer this from the other flags.required
--optimizer-modelThe model that edits the skill based on accumulated evidence. This is separate from the target model being optimized for.required
--target-modelRequired for every source. For synthetic and real it is the model that reruns tasks; for log it is still required because it determines proposal storage location.required
--target-adapterHarness paired with the target model. In log mode this is informational, but in rerun modes it determines which adapter actually executes the evaluation loop.bare-agent

Task-Source-Specific Flags

SourceFlagsMeaning
synthetic--synthetic-count, --synthetic-test-countControls how many train and held-out test tasks the optimizer should synthesize directly from the skill description before the loop begins.
real--tasks, --test-tasksUses explicit benchmark tasks as evidence. If --test-tasks is omitted, the training set is reused as the evaluation set, which weakens holdout protection.
log--logs, --failuresConsumes existing conversation logs and optional structured failure JSON files. No tasks are rerun in this mode.

Loop, Delivery, and Batch Flags

FlagDescriptionDefault
--roundsMaximum number of optimization rounds after the baseline round. Round 0 is always the starting skill snapshot; later rounds iterate edit → rerun → score.3 for synthetic/real, 1 for log
--runs-per-taskNumber of executions per task per round in rerun modes. Raised above 1 by default to make best-round selection less sensitive to single-run noise.2
--task-concurrencyMaximum in-flight task runs across train and test sets in a round.1
--convergenceEarly-exit threshold on the primary score. When a round meets or exceeds this score, the loop can stop before consuming all remaining rounds.0.95
--baselineAlso evaluates no-skill and original-skill baselines for comparison. This is forbidden in log mode because log mode does not rerun tasks.false
--no-keep-all-roundsPrunes proposal storage so only the chosen best round is retained instead of keeping every intermediate round directory.false
--auto-applyDeploys the best round back onto the original skill directory immediately after selection.false
--skill-listRuns batch optimization over one skill path per line.off
--concurrencyBatch-job parallelism when multiple skills are optimized in the same invocation.1
Source Compatibility Rules

SkVM validates task-source-specific flags strictly. For example, --tasks is only valid for real, --logs is only valid for log, and loop-control flags like --baseline are rejected for log because there is no rerun phase.

CLI: pipeline

Profile the target if no cached TCP exists, then run aot-compile with that TCP. This is the shortest path when you want a compiled variant but do not want to manually split the work into profile and compile steps.

usage
skvm pipeline --skill=<path> --model=<provider>/<model-id> [options]
FlagDescriptionDefault
--skillRequired. Skill directory or SKILL.md path that should be compiled.required
--modelRequired. Target model ID in the form <provider>/<model-id>, used both for cache lookup and eventual compilation.required
--adapterHarness whose TCP should be used. The same adapter is used for auto-profiling when no cached profile exists.bare-agent
--force-profileForces a fresh profiling run instead of reusing a cached TCP. Use it when you suspect the cache is stale but still want the convenience of the one-command pipeline.false
--profileSupplies a specific TCP file and skips auto-profiling. This is the escape hatch for deterministic reproduction or external TCP inspection.auto-load or auto-profile
--passSelects which AOT passes to run after the TCP is available.1,2,3
--compiler-modelOverrides the compiler LLM used during the compilation stage.openrouter/anthropic/claude-sonnet-4.6
--dry-runPrints the resulting compile summary without writing the output variant.false

Operationally, pipeline has three branches: load an explicit profile, reuse a cached profile, or run profile inline. Only after the TCP is resolved does it move on to the compile stage.

CLI: proposals

Inspect, diff, group, serve, accept, or reject the proposal trees created by jit-optimize. This command is the review surface between optimization and deployment.

usage
skvm proposals list | show | diff | report | serve | accept | reject [options]
Subcommand / FlagDescription
listLists proposals and supports filtering by --harness, --target-model, --skill, and --status, plus sorting and grouping for review workflows.
show <id>Prints proposal metadata, per-round summary, and optionally full analysis content with --full.
diff <id> [--round=N]Prints a unified diff between the original skill and a chosen round. If --round is omitted, the best round is used.
reportGenerates an HTML report for the filtered proposal set. --out overrides the output path.
serveStarts the local proposal review server. --port, --host, and --no-open control how the server is exposed.
accept <id>Deploys the best round or the round chosen by --round. --target overrides the deployment directory.
reject <id>Marks the proposal as rejected without deploying anything.
--sort, --min-delta, --group-by, --no-colorFormatting and review controls for list/report-style outputs.

Use proposals as the human review gate. jit-optimize is allowed to propose edits, but accept is the point where those edits become the live skill bundle.

CLI: run

Execute one task against one model+adapter, with or without a skill. This is execute-only and primarily for testing; use bench when you need scored evaluation.

usage
skvm run --task=<path> --model=<provider>/<model-id> --adapter=<name> [--skill=<path>]
FlagDescriptionDefault
--taskRequired. Path to a task JSON file using the bench task schema.required
--modelRequired. Provider-prefixed model identifier in the form <provider>/<model-id>, passed through to the chosen adapter.required
--skillOptional skill to inject for the run. Omit it to execute the task without any skill assistance.none
--adapterHarness used to execute the task.bare-agent
--workdirReuses a specific working directory instead of creating a temporary one. Useful for reproducing runs and inspecting artifacts across iterations.temporary directory
--timeoutMsOverrides the task timeout defined in the task file.task-defined timeout
--maxStepsOverrides the task or adapter step budget for the run.task-defined max steps
--verboseEnables more detailed execution logging.false

SkVM copies any files under the task's fixtures/ directory into the work directory before execution, then reports workdir, timing, token usage, and non-OK run status at the end.

CLI: bench

Run benchmark conditions over tasks, skills, and models. bench is the widest CLI surface in SkVM because it covers standard benchmarking, deferred judging, session resumption, task import, and condition-to-condition comparison.

usage
skvm bench --model=<provider>/<model-id> [options]

Detailed parameter guide. The standard execution path builds a benchmark plan from model × adapter × task × condition, while submodes like judge, --compare, --import, and --custom bypass parts of that plan builder.

FlagDescriptionDefault
--modelRequired for normal benchmarking unless you are resuming a session that already records the model. Use provider-prefixed IDs in the form <provider>/<model-id>; comma-separated values enable multi-model mode.required
--adapterSelects one or more harnesses. Multi-adapter mode is supported, but cannot be combined with multi-model mode in the same invocation.bare-agent
--tasksRestricts benchmarking to a comma-separated task subset instead of the full task pool.all tasks
--sourceFilters tasks by origin source, such as pinchbench, skillsbench, or other importer-specific labels stored in the task metadata.all sources
--conditionsSelects which skill conditions to evaluate, including no-skill, original, aot-compiled, pass-specific AOT variants like aot-compiled-p12, jit-boost, and jit-optimized.all standard conditions
--customRuns a YAML-defined custom benchmark plan with explicit nested task-skill-model-adapter mappings. This bypasses the standard condition system entirely.off
--skill-modeControls whether skills are directly injected or discovered by the harness. This matters when the harness has its own skill loading semantics.inject
--jit-runsWarm-up repetitions used by the jit-boost condition before measuring the solidified path. Higher values give the boost mechanism more chances to promote repeated patterns.3
--timeout-multMultiplies per-task timeout budgets. Use it when a model or adapter is known to be slower than the default task envelopes assume.1.0
--max-stepsOverrides the maximum number of agent steps each run may take before the harness is stopped.30
--judge-modelSelects the LLM used by llm-judge criteria. This only affects judging, not the model being benchmarked.openrouter/anthropic/claude-sonnet-4.6
--compiler-modelOverrides the compiler model used when a requested benchmark condition needs to materialize an AOT variant during the run.openrouter/anthropic/claude-sonnet-4.6
--profileSupplies a TCP path for AOT conditions. Without it, AOT conditions are skipped when the needed TCP cannot be resolved from cache.auto-load if available
--resumeResumes an interrupted session by id or with latest. This preserves progress instead of re-running already completed work.off
--list-sessionsPrints known benchmark sessions and their statuses without running anything.off
--concurrencyControls total parallel task execution. In multi-model mode the slots are distributed across models rather than spent inside a single model run.1
--runs-per-taskRepeats each task-condition pair multiple times and averages the result, reducing noise from stochastic decoding or unstable harness behavior.1
--keep-workdirsKeeps task working directories after completion so failures can be inspected manually.false
--verboseEnables debug logging during orchestration and execution.false

Async Judge and Compare Modes

Flag / SubcommandWhat It Does
--async-judgeDefers llm-judge criteria into a post-run batch. This is useful when you want the expensive LLM judging stage decoupled from the main benchmark execution.
bench judge --manifest=<dir>Runs the deferred judge pass later from the generated manifest directory.
--merge-judge=<results-dir>Merges post-run judging results back into an existing report.
--compareSwitches bench into artifact comparison mode instead of execution mode.
--skill-path, --lhs, --rhs, --output-dirRequired inputs for compare mode: the skill to inspect, the two conditions to compare, and where to write the generated diff/report outputs.
--analyze-modelOptional summarization model used to generate a higher-level explanation of the skill difference during compare mode.

Import Mode

FlagWhat It Does
--importRuns task importers instead of benchmarking. Current sources are pinchbench and skillsbench.
--pathOverrides the source directory used by the importer.
--dry-runShows what would be imported without writing files.
examples
# Single model benchmark
        skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --adapter=bare-agent

# Specific conditions and tasks
        skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --conditions=no-skill,original,aot-compiled,jit-boost

# Defer LLM-judge and process it later
        skvm bench --model=openrouter/qwen/qwen3.5-35b-a3b --async-judge
skvm bench judge --manifest=path/to/manifest-dir --judge-model=openrouter/anthropic/claude-sonnet-4.6

CLI: clean-jit

Clear persisted JIT artifacts for a model+adapter pair when you want to reset solidification or proposal state.

usage
skvm clean-jit --model=<provider>/<model-id> --adapter=<name>
FlagDescriptionDefault
--modelRequired. Provider-prefixed model whose runtime JIT state should be cleared.required
--adapterRequired. Adapter whose runtime artifacts should be cleaned alongside the model key.required
--dry-runPrints the deletion plan, including runtime directories and matching solidification-state.json files, without removing anything.false
--yesConfirms destructive cleanup. Required unless --dry-run is active.false
--include-bench-logsAlso deletes matching benchmark session directories in addition to runtime JIT state.false

The command intentionally keeps compiled SKILL.md artifacts, candidate metadata, and cached profiles intact. It is for resetting JIT effects, not wiping every derived artifact.

CLI: logs

List recent runs across profiling, compilation, bench, and runtime subsystems from the shared cache tree.

usage
skvm logs
FlagDescriptionDefault
--typeFilters sessions by subsystem, such as profile, aot-compile, bench, run, or pipeline.all session types
--limitLimits how many recent entries are shown.20
--allDisables the limit and prints the full session index.false

Each entry includes status, type, model or model count, harness, skill, summary text, and the log directory path, so logs acts as a lightweight session index over the shared cache.

Compilation Feature

Primitive Catalog

SkVM defines 26 primitive capabilities that describe what an LLM agent can do. Each primitive is testable at three difficulty levels (L1–L3). The catalog is organized into four domains:

DomainPrefixExamples
Code Generationgen.code.*write, edit, debug, test, refactor
Tool Usetool.*file.read, file.write, exec, web_fetch
Reasoningreason.*plan, decompose, diagnose, analyze
Instruction Followingfollow.*format, constraint, multi-step, edge-case

Each primitive has a dedicated microbenchmark generator that produces randomized test instances. Generators use two evaluation patterns:

  • Tool-use primitives — agent runs tools, evaluator checks files in the working directory
  • Text-only primitives — profiler writes LLM response to file, evaluator reads it

TCP — Target Capability Profile

A TCP is the output of profiling: a JSON file that maps each of the 26 primitives to a proficiency level.

LevelMeaning
L3Full proficiency — handles complex instances
L2Moderate proficiency — handles standard instances
L1Basic proficiency — handles simple instances
L0No proficiency — fails even simple instances

The profiler uses progressive testing: it tests L3 first. If the model passes, L2 and L1 are skipped (assumed passed). This minimizes API costs while maintaining accuracy.

.skvm/profiles/bare-agent/qwen-qwen3-30b-a3b-instruct-2507.json (excerpt)
{
  "gen.code.write": { "level": 2, "scores": { "L3": 0.33, "L2": 0.83 } },
  "tool.file.read": { "level": 3, "scores": { "L3": 1.0 } },
  "reason.plan":    { "level": 1, "scores": { "L3": 0.0, "L2": 0.17, "L1": 0.67 } }
}

SCR — Skill Capability Requirement

An SCR describes what primitives a skill needs and at what proficiency level. It is extracted automatically by the compiler (Pass 1) from the skill's SKILL.md file.

The SCR may include alternative implementation paths — different ways to accomplish the same goal using different primitives. This allows the compiler to find substitutions when a model lacks a required capability.

Key Insight

The gap between a model's TCP and a skill's SCR determines what compilation transforms are needed. No gap means no transformation — the skill runs as-is.

3-Pass Compilation

The AOT compiler transforms skills in three sequential passes. Each pass emits artifacts that narrow the next stage's search space, so the compiler is not just editing prose: it is moving from capability analysis to environment binding to executable scheduling hints.

Pass 1: Capability Gap Analysis

Pass 1 reads the skill as a requirement document. It extracts the SCR, maps each purpose to the primitives and minimum levels it needs, compares those against the target TCP, and decides whether the skill can run unchanged, needs compensation, or needs structural substitution.

  • L0 gaps — capability absent → substitution (replace with alternative primitives)
  • Weak gaps — capability present but below required level → compensation (add scaffolding, examples, decomposition)

Because the SCR can contain alternative implementation paths, Pass 1 is not limited to saying “the model is weaker.” It can often redirect the skill onto a different primitive path that the target model is better at executing.

Pass 2: Environment Binding

Pass 2 binds the rewritten skill to the actual machine it will run on. It extracts dependency manifests, checks whether binaries, Python packages, APIs, or shell tools already exist, and emits an idempotent setup script that can recreate the required environment.

This means AOT output is not only “better instructions for the model”; it also becomes a more operational skill bundle with explicit setup knowledge instead of hidden environmental assumptions.

Pass 3: Concurrency Extraction

Pass 3 analyzes the workflow structure itself. It decomposes the skill into a DAG, identifies true data and control dependencies, and looks for places where the original sequential prose can be turned into parallel work without changing semantics.

  • DLP — Data-Level Parallelism (process independent data chunks)
  • ILP — Instruction-Level Parallelism (pipeline independent steps)
  • TLP — Task-Level Parallelism (run independent subtasks concurrently)

The resulting variant can therefore preserve the same end behavior while exposing runtime scheduling opportunities that a plain natural-language skill would usually leave implicit.

JIT Optimization

SkVM provides two independent JIT systems that operate after the original skill already exists. They solve different problems: JIT-boost reduces repeated runtime cost by bypassing predictable LLM calls, while JIT-optimize edits the skill itself based on evidence collected from runs.

JIT-Boost (Code Solidification)

JIT-boost is a runtime specialization layer. It first uses a headless agent to scan the whole skill bundle and emit boost-candidates.json entries containing code signatures, keywords, parameter templates, and execution templates. During execution, runtime hooks watch LLM calls for repeated matches against those signatures.

After enough consecutive matches, a candidate is promoted. From that point on, SkVM extracts parameters directly from the prompt, executes the stored template, and bypasses the LLM entirely for that repeated pattern.

  • Zero LLM calls at runtime for promoted patterns
  • Automatic demotion on failure (falls back to LLM)
  • Model and harness agnostic — stored per skill

So JIT-boost is closer to code caching or solidification than to skill rewriting: it accelerates a stable repeated behavior path without changing the skill source itself.

JIT-Optimize (Skill Rewriting)

JIT-optimize is a proposal-driven editing loop. It normalizes evidence into a shared schema, copies the skill into a temporary workspace, writes the evidence and history into .optimize/, and launches a headless agent that can edit SKILL.md or any bundle files.

The optimizer must submit a structured record containing a required rootCause, reasoning, confidence, and changed files. SkVM then snapshots the workspace as a numbered round, computes the actual diff, reruns evaluation when appropriate, and chooses the best round according to score and monotonicity checks.

Because the result is stored as a proposal tree rather than applied immediately, JIT-optimize is a controlled editing workflow, not an in-place mutation mechanism.

Autotune

Autotune is the fully closed loop built on top of jit-optimize --task-source=synthetic: generate evaluation tasks, execute them, score the result, propose edits, and repeat until rounds are exhausted or convergence is reached. It is the nearest thing SkVM has to self-supervised online skill improvement.

Architecture

System Overview

SkVM is structured as a modular pipeline:

data flow
Profile Tool ──TCP──> AOT Compiler ──Variant──> Runtime + Agent
     │                    │                         │
  26 primitives    3 passes                JIT-boost + JIT-optimize
  L3→L1 progressive   1: capability gaps     - code solidification
                       2: env binding         - skill optimization
                       3: concurrency DAG     - autotune loop

Key design principles:

  • Decoupled stages — profiler, compiler, and runtime can run independently
  • Pluggable adapters — any agent harness can be integrated via the AgentAdapter interface
  • Pluggable providers — any LLM backend via the LLMProvider interface
  • Cached artifacts — profiles, logs, and proposal-tree outputs are persisted and reused

Agent Adapters

Adapters wrap agent harnesses and provide a uniform interface. All adapters support RuntimeHooks for JIT monitoring.

AdapterHarnessDescription
bare-agentBuilt-inMinimal agent loop with 5 tools. Primary adapter for profiling and testing.
opencodeOpenCode CLIWraps OpenCode, parses NDJSON event stream.
openclawOpenClaw CLIWraps OpenClaw, manages temporary agent instances.
hermesHermes CLIWraps Hermes and preserves full token and cost usage metadata.
jiuwenclawJiuwenclaw CLIWraps jiuwenclaw-cli over JSON-RPC; token and cost are not persisted upstream.

LLM Providers

Two LLM provider implementations are included:

  • Anthropic — native Anthropic SDK, supports tool_use
  • OpenRouter — OpenAI-compatible API, routes to 100+ models

The extractStructured() function provides two-layer structured output: tool_use when available, prompt + parse fallback otherwise.

Evaluation Framework

Four evaluation methods, used by both the profiler and bench subsystems:

MethodMechanism
scriptShell script exit code (0 = pass)
file-checkCheck file contents: exact match, contains, regex, JSON schema
llm-judgeLLM evaluator with rubric, scores 0–1
customRegistered evaluator functions
Configuration

Environment Variables

Set these in a .env file at the project root. Bun auto-loads it.

VariablePurpose
OPENROUTER_API_KEYAPI key for OpenRouter (agent/profiler)
ANTHROPIC_API_KEYAPI key for Anthropic (compiler backend)

Provider Routing

Start with skvm config init. The interactive wizard writes $SKVM_CACHE/skvm.config.json (default ~/.skvm/skvm.config.json) and lets you configure providers, API keys, and adapter checkouts without editing JSON by hand.

terminal
skvm config init
skvm config show
skvm config doctor

Every CLI model field uses <provider>/<model-id>. In the docs, the default route uses openrouter/, so a Qwen target is written as openrouter/qwen/qwen3.5-35b-a3b and Claude through OpenRouter is written as openrouter/anthropic/claude-sonnet-4.6. If you configure Anthropic directly as the provider, the same Claude model becomes anthropic/claude-sonnet-4.6. In other words, anthropic is the provider in the native route, but part of the model id when the provider is openrouter.

skvm.config.json
{
  "providers": {
    "routes": [
      { "match": "anthropic/*",  "kind": "anthropic",         "apiKeyEnv": "ANTHROPIC_API_KEY" },
      { "match": "openai/*",     "kind": "openai-compatible", "apiKeyEnv": "OPENAI_API_KEY",     "baseUrl": "https://api.openai.com/v1" },
      { "match": "openrouter/*", "kind": "openrouter",        "apiKeyEnv": "OPENROUTER_API_KEY" }
    ]
  }
}
Routing Rules

providers.routes is matched top to bottom and the first glob wins. SkVM strips the first path segment before sending the model id to the backend SDK. For openai-compatible routes, baseUrl is required. Unprefixed model ids do not auto-route and will fail to match.

Model IDs

SkVM uses provider-prefixed model identifiers in the form <provider>/<model-id>. The default examples below use OpenRouter, and the final line shows the native Anthropic form for comparison:

examples
openrouter/qwen/qwen3.5-35b-a3b
openrouter/anthropic/claude-sonnet-4.6
openrouter/google/gemini-2.5-flash
openrouter/qwen/qwen3-30b-a3b-instruct-2507
openrouter/deepseek/deepseek-chat-v3-0324
anthropic/claude-sonnet-4.6

Data Directory

Current SkVM uses two roots: skvm-data/ for the input dataset, and .skvm/ for local runtime cache and generated artifacts.

layout
skvm-data/
├── skills/         # input skill definitions
└── tasks/          # benchmark task dataset

.skvm/
├── profiles/       # cached TCP JSON files
├── log/            # profile / compile / bench / runtime logs
└── proposals/      # aot-compile, jit-boost, jit-optimize outputs
Important

The legacy flat data/ submodule has been retired. To update the dataset, commit and push inside skvm-data/, then run git add skvm-data && git commit in the main repo to update the submodule pointer.