Automatic skill distillation, retrieval, and evolution from AI coding agent trajectories.
AI coding agents solve the same types of problems repeatedly but start from scratch every session. When Claude Code figures out the exact sequence to fix a race condition in your WebSocket server, that knowledge vanishes when the session ends.
skillrl builds a persistent, searchable skill memory from agent trajectories. It implements the SkillRL three-phase framework — distillation, retrieval, and evolution — so agents improve over time without fine-tuning.
npm install -g skillrlskillrl ingest conversation.jsonl # Extract skills from a session
skillrl index # Build the vector search index
skillrl retrieve "fix race condition in WebSocket handler"
# Found 5 relevant skills:
# async_concurrency_guard (relevance: 78%)
# websocket_lifecycle_management (relevance: 71%)
# ...- How It Works
- Quick Start
- CLI Reference
- MCP Server
- Programmatic API
- Embedding Index
- Model Configuration
- Export Formats
- Configuration
- Troubleshooting
skillrl implements the three-phase framework from the SkillRL paper (arXiv:2602.08234v1):
Agent Trajectories
(Claude Code, Kiro, Cursor, OpenClaw)
|
+--------------+--------------+
| |
Success Trajectories Failed Trajectories
| |
v v
+------------------+ +------------------+
| DISTILLATION | | DISTILLATION |
| Extract reusable | | Synthesize |
| patterns from | | corrective |
| what worked | | skills from |
| (Section 3.1) | | what failed |
+--------+----------+ +--------+----------+
| |
+----------+ +-------------+
| |
v v
+------------------+
| SKILL BANK |
| Persistent JSON |
| Sg (general) |
| Sk (domain) |
+--------+---------+
|
+------------+------------+
| |
v v
+------------------+ +------------------+
| RETRIEVAL | | EVOLUTION |
| Embedding KNN | | Refine from |
| + domain filter | | accumulated |
| (Section 3.2) | | failures |
+--------+----------+ | (Section 3.3) |
| +--------+----------+
v |
+------------------+ |
| Agent Context | <--------------+
| Skills injected |
| into next |
| session |
+------------------+
An LLM analyzes the trajectory and extracts structured skills. Each skill contains 10 fields: name, description, domain, category, instructions (step-by-step), prerequisites, anti-patterns, examples (with context/input/output), tags, and a confidence score (0.0-1.0).
Separate prompts handle success vs. failure trajectories. From success: "what patterns should be repeated?" From failure: "what should have been done differently?" New skills are deduplicated against the existing bank using an LLM call.
Two-tier search. The fast path uses pre-built embeddings (Gemini gemini-embedding-001, 256 dimensions) stored in SQLite with sqlite-vec KNN search. One API call for the query embedding, then a pure SQL vector similarity query. Domain partition keys enable pre-filtered search — querying with --domain react only scans React skills, not the entire index.
The fallback path (when no embedding index exists) uses keyword matching on skill name/description/tags, then LLM-based reranking for ambiguous cases. The retriever tries embeddings first and falls back transparently.
After accumulating failures, the evolution cycle analyzes patterns across failed trajectories, creates new skills to fill gaps, refines existing skills with updated instructions and confidence, and deprecates skills that consistently lead to poor outcomes.
Skills follow the paper's dual-pool architecture:
- Sg (General): Broadly applicable patterns — "Incremental Verification Loop", "Hypothesis-Driven Debugging"
- Sk (Task-Specific): Domain-bound techniques — "React State Management", "SQL Query Optimization"
Get a free API key from Google AI Studio:
skillrl config YOUR_GEMINI_API_KEY
# Or environment variable
export GEMINI_API_KEY=your_api_keynpm install @aws-sdk/client-bedrock-runtime
export AWS_BEARER_TOKEN_BEDROCK=your_bedrock_api_key
export AWS_REGION=us-east-1export ANTHROPIC_API_KEY=your_api_keyskillrl init # Create empty skill bank
skillrl ingest conversation.jsonl # Ingest Claude Code session
# Distilled 3 skills:
# incremental_verification_loop [general] (confidence: 98%)
# silent_failure_recovery [general] (confidence: 90%)
# ...
skillrl index # Build embedding indexskillrl retrieve "implement JWT authentication with refresh tokens"
# Found 5 relevant skills:
# auth_token_lifecycle (relevance: 74%)
# ...
skillrl retrieve "fix race condition" --domain typescriptskillrl export kiro-power --output ./power # Kiro Power bundle
skillrl export skill-md --output SKILL.md # Claude Code
skillrl export cursorrules --output .cursorrules # Cursor| Command | Description |
|---|---|
skillrl init |
Initialize an empty skill bank |
skillrl ingest <file> |
Parse a trajectory and extract skills |
skillrl ingest --stdin |
Read trajectory from stdin/pipe |
skillrl index |
Build/rebuild the embedding index |
skillrl retrieve "<task>" |
Find relevant skills by semantic search |
skillrl evolve <file> |
Run evolution cycle on failed trajectories |
skillrl export <format> |
Export skill bank (5 formats) |
skillrl list |
Display all skills with metadata |
skillrl stats |
Show skill bank statistics |
skillrl import <file> |
Merge skills from another bank |
skillrl config [api-key] |
Show or configure settings |
skillrl test |
Test LLM provider connection |
| Option | Description |
|---|---|
--provider, -p |
LLM provider: gemini (default), bedrock, claude |
--model, -m |
Model alias or full ID (see Model Configuration) |
--bank-path |
Custom skill bank path (default: .skillrl/bank.json) |
--domain |
Filter by domain (e.g., typescript, react, python) |
--source |
Trajectory source: claude-code, kiro, cursor, openclaw, custom |
--output, -o |
Output path for exports |
--verbose, -v |
Show detailed output |
| Format | Description | Auto-Detected |
|---|---|---|
| JSONL | Claude Code conversation logs | Yes |
| JSON | Structured { task, steps, outcome } object |
Yes |
| Kiro | Section-delimited logs with --- markers |
Yes |
| Text | Unstructured agent logs | Yes |
skillrl includes an MCP server with 8 tools for direct agent integration.
Add to your MCP configuration (~/.claude.json, project .mcp.json, or equivalent):
{
"mcpServers": {
"skillrl": {
"command": "npx",
"args": ["-y", "skillrl-mcp"],
"env": {
"GEMINI_API_KEY": "your_api_key"
}
}
}
}For Bedrock:
{
"mcpServers": {
"skillrl": {
"command": "npx",
"args": ["-y", "skillrl-mcp"],
"env": {
"RLM_PROVIDER": "bedrock",
"AWS_BEARER_TOKEN_BEDROCK": "your_key",
"AWS_REGION": "us-east-1"
}
}
}
}| Tool | Description |
|---|---|
skill_ingest |
Parse trajectory and extract skills |
skill_retrieve |
Semantic search for relevant skills |
skill_evolve |
Run evolution from failed trajectories |
skill_export |
Export to kiro-power, skill-md, cursorrules, markdown, or json |
skill_list |
List skills with domain/category/confidence filters |
skill_bank_stats |
Skill bank statistics |
skill_config |
Current configuration |
skill_index |
Build/rebuild embedding index |
All tool inputs are validated with Zod schemas. The agent receives ranked skills with relevance scores, step-by-step instructions, and anti-patterns injected into its context.
ESM-only ("type": "module"). Requires Node.js >= 18.
import { createSkillManager } from 'skillrl';
const manager = createSkillManager({
provider: 'gemini',
bankPath: '.skillrl/bank.json',
});
// Distill skills from a trajectory
const result = await manager.distiller.distill(trajectory);
console.log(`Extracted ${result.skills.length} skills`);
// Retrieve relevant skills
const retrieval = await manager.retriever.retrieve(
'implement OAuth2 with PKCE flow',
'typescript'
);
for (const { skill, relevanceScore } of retrieval.skills) {
console.log(`${skill.name} (${(relevanceScore * 100).toFixed(0)}%)`);
}
// Evolve from failures
const evolution = await manager.evolver.evolve(failedTrajectories);
console.log(`New: ${evolution.newSkills.length}, Refined: ${evolution.refinedSkills.length}`);import { SkillBankManager, EmbeddingManager } from 'skillrl';
// Load the skill bank
const bank = new SkillBankManager({ bankPath: '.skillrl/bank.json' });
bank.load();
const skills = bank.listSkills();
console.log(`${skills.length} skills loaded`);
// Embedding index (SQLite + sqlite-vec)
const emb = new EmbeddingManager({ bankPath: '.skillrl/bank.json' });
await emb.load();
// KNN search with domain filtering
const results = await emb.search('handle authentication errors', {
topK: 5,
threshold: 0.3,
domain: 'typescript',
});
for (const { skillId, score } of results) {
console.log(`${skillId}: ${(score * 100).toFixed(1)}% match`);
}
emb.close();import { getExporter } from 'skillrl';
const exporter = getExporter('kiro-power');
const result = await exporter.export(bank.getBank(), {
format: 'kiro-power',
outputPath: './power',
domain: 'typescript',
minConfidence: 0.6,
});import { resolveModelConfig } from 'skillrl/models';
import { getApiKey, detectProvider } from 'skillrl/config';
import type { LLMProvider } from 'skillrl/providers';The embedding index uses SQLite + sqlite-vec for hardware-accelerated KNN search.
CREATE TABLE index_metadata (
key TEXT PRIMARY KEY, value TEXT NOT NULL
);
CREATE TABLE skill_metadata (
skill_id TEXT PRIMARY KEY,
text TEXT NOT NULL,
domain TEXT NOT NULL DEFAULT 'general',
model TEXT NOT NULL,
dimensions INTEGER NOT NULL,
updated_at TEXT NOT NULL
);
CREATE INDEX idx_skill_metadata_domain ON skill_metadata(domain);
CREATE VIRTUAL TABLE vec_skill_embeddings USING vec0(
skill_id TEXT PRIMARY KEY,
domain TEXT partition key,
embedding float[256] distance_metric=cosine
);- The query string is embedded via
gemini-embedding-001(one API call, 256-dimensional vector) - The embedding is passed to sqlite-vec's
MATCHoperator with the requestedk(top-K) and optionaldomainpartition filter - sqlite-vec performs approximate nearest neighbor search using cosine distance
- Results are converted:
similarity = 1 - distance
-- With domain filter (searches only the "react" partition):
SELECT skill_id, distance FROM vec_skill_embeddings
WHERE embedding MATCH ?1 AND k = ?2 AND domain = 'react'
ORDER BY distance;
-- Without domain filter (searches all partitions):
SELECT skill_id, distance FROM vec_skill_embeddings
WHERE embedding MATCH ?1 AND k = ?2
ORDER BY distance;| Setting | Value |
|---|---|
| Embedding model | gemini-embedding-001 (256 dimensions, float32) |
| Distance metric | Cosine |
| Journal mode | WAL (concurrent readers, single writer) |
| Sync mode | NORMAL (acceptable for regenerable data) |
| Busy timeout | 5000ms |
| Domain filtering | Partition key (pre-filtered, not post-filtered) |
| Storage | embeddings.db alongside bank.json |
If an embeddings.json file exists from a previous version, it is automatically migrated to SQLite on the next load(), search(), or index call. The original file is renamed to embeddings.json.migrated.
| Alias | Model ID |
|---|---|
fast, default, flash |
gemini-3-flash-preview |
smart, pro |
gemini-3-pro-preview |
flash-2 |
gemini-2.0-flash-exp |
flash-2.5 |
gemini-2.5-flash |
| Alias | Model ID |
|---|---|
fast, default, nova-2-lite |
us.amazon.nova-2-lite-v1:0 |
smart, claude-4.5-sonnet |
us.anthropic.claude-sonnet-4-5-* |
claude-4.5-opus |
us.anthropic.claude-opus-4-5-* |
llama-4 |
us.meta.llama4-maverick-* |
Bedrock requires @aws-sdk/client-bedrock-runtime (optional peer dependency).
| Alias | Model ID |
|---|---|
fast, haiku |
claude-haiku-4-5-20251001 |
smart, default, sonnet |
claude-sonnet-4-5-20250929 |
opus |
claude-opus-4-5-20251101 |
skillrl ingest session.jsonl --model fast
skillrl ingest session.jsonl --model smart
skillrl ingest session.jsonl --provider bedrock --model claude-4.5-sonnetFull IDE integration bundle with directory structure:
power/
POWER.md # Activation manifest
mcp.json # MCP server configuration
steering/ # Domain-grouped skill files
hooks/ # Agent lifecycle hooks (auto-ingestion)
YAML frontmatter + markdown. Compatible with Claude Code. Includes version, domains, skill count, generated_by metadata.
Native Cursor IDE rules format. Skills sorted by confidence, inline domain tags.
Human-readable documentation with table of contents, overview table, and skills grouped by domain.
Raw skill bank for programmatic use, backup, and import into other banks.
- Environment variables (
GEMINI_API_KEY,AWS_BEARER_TOKEN_BEDROCK,ANTHROPIC_API_KEY) .envfile in current directory.env.localfile in current directory~/.skillrl/.env~/.config/skillrl/.env~/.skillrl/config.json~/.config/skillrl/config.json
RLM_PROVIDERenvironment variableproviderfield in config file- Auto-detect based on available credentials (Gemini -> Bedrock -> Claude)
- Default:
gemini
.skillrl/
bank.json # Skill definitions and metadata
embeddings.db # SQLite vector index (auto-created by `skillrl index`)
embeddings.db-wal # WAL journal (auto-managed)
embeddings.db-shm # Shared memory (auto-managed)
skillrl config # Check current state
skillrl config YOUR_API_KEY # Set Gemini keyBuild the embedding index after ingesting:
skillrl stats # Verify skills exist
skillrl index # Build the indexnpm install @aws-sdk/client-bedrock-runtimeThe first search() call opens the SQLite database and loads the sqlite-vec extension (~5-20ms). Subsequent calls reuse the connection.
For 500 skills with 256-dimension embeddings, expect ~2-5 MB. WAL and SHM files are temporary.
import type {
Skill, SkillExample, SkillMetadata,
SkillBank, SkillBankMetadata, SkillConfig,
DistillationResult, RetrievalResult, EvolutionResult,
ExportResult, ScoredSkill,
Trajectory, TrajectoryStep, ToolCall,
ExportOptions,
SkillEmbedding, EmbeddingIndex,
ProviderName, ResolvedModelConfig,
} from 'skillrl';interface Skill {
id: string;
name: string;
description: string;
domain: string; // e.g., "typescript", "react", "python"
category: 'general' | 'task-specific';
instructions: string[]; // Step-by-step
prerequisites: string[];
antiPatterns: string[]; // What to avoid
examples: { context: string; input: string; output: string }[];
metadata: {
usageCount: number;
successRate: number;
lastUsed: string | null;
evolvedFrom: string | null;
deprecated: boolean;
deprecationReason: string | null;
};
tags: string[];
confidence: number; // 0.0 - 1.0
version: number;
createdAt: string;
updatedAt: string;
sourceTrajectories: string[];
}- API keys stored locally, transmitted only to configured LLM providers
- Path traversal protection prevents reads/writes outside the sandbox
- Zod validation on all MCP tool inputs with length limits
- Output sanitization escapes markdown/YAML injection in exports
- Read-only trajectory ingestion never modifies source code
MIT
Based on SkillRL: Skill-Based Transferable Reinforcement Learning for LLM Agents (arXiv:2602.08234v1).
Part of the RLM project.
Contributions welcome. GitHub Issues | Discussions