Skip to content

zendizmo/skillRL

Repository files navigation

skillrl

Automatic skill distillation, retrieval, and evolution from AI coding agent trajectories.

npm version License: MIT

AI coding agents solve the same types of problems repeatedly but start from scratch every session. When Claude Code figures out the exact sequence to fix a race condition in your WebSocket server, that knowledge vanishes when the session ends.

skillrl builds a persistent, searchable skill memory from agent trajectories. It implements the SkillRL three-phase framework — distillation, retrieval, and evolution — so agents improve over time without fine-tuning.

npm install -g skillrl
skillrl ingest conversation.jsonl   # Extract skills from a session
skillrl index                       # Build the vector search index
skillrl retrieve "fix race condition in WebSocket handler"
# Found 5 relevant skills:
#   async_concurrency_guard (relevance: 78%)
#   websocket_lifecycle_management (relevance: 71%)
#   ...

Table of Contents


How It Works

skillrl implements the three-phase framework from the SkillRL paper (arXiv:2602.08234v1):

                    Agent Trajectories
                   (Claude Code, Kiro, Cursor, OpenClaw)
                           |
            +--------------+--------------+
            |                             |
     Success Trajectories          Failed Trajectories
            |                             |
            v                             v
   +------------------+        +------------------+
   |  DISTILLATION     |        |  DISTILLATION     |
   |  Extract reusable |        |  Synthesize       |
   |  patterns from    |        |  corrective       |
   |  what worked      |        |  skills from      |
   |  (Section 3.1)    |        |  what failed      |
   +--------+----------+        +--------+----------+
            |                             |
            +----------+    +-------------+
                       |    |
                       v    v
              +------------------+
              |  SKILL BANK      |
              |  Persistent JSON |
              |  Sg (general)    |
              |  Sk (domain)     |
              +--------+---------+
                       |
          +------------+------------+
          |                         |
          v                         v
 +------------------+     +------------------+
 |  RETRIEVAL        |     |  EVOLUTION        |
 |  Embedding KNN    |     |  Refine from      |
 |  + domain filter  |     |  accumulated      |
 |  (Section 3.2)    |     |  failures         |
 +--------+----------+     |  (Section 3.3)    |
          |                 +--------+----------+
          v                          |
 +------------------+                |
 |  Agent Context   | <--------------+
 |  Skills injected |
 |  into next       |
 |  session         |
 +------------------+

Phase 1: Distillation

An LLM analyzes the trajectory and extracts structured skills. Each skill contains 10 fields: name, description, domain, category, instructions (step-by-step), prerequisites, anti-patterns, examples (with context/input/output), tags, and a confidence score (0.0-1.0).

Separate prompts handle success vs. failure trajectories. From success: "what patterns should be repeated?" From failure: "what should have been done differently?" New skills are deduplicated against the existing bank using an LLM call.

Phase 2: Retrieval

Two-tier search. The fast path uses pre-built embeddings (Gemini gemini-embedding-001, 256 dimensions) stored in SQLite with sqlite-vec KNN search. One API call for the query embedding, then a pure SQL vector similarity query. Domain partition keys enable pre-filtered search — querying with --domain react only scans React skills, not the entire index.

The fallback path (when no embedding index exists) uses keyword matching on skill name/description/tags, then LLM-based reranking for ambiguous cases. The retriever tries embeddings first and falls back transparently.

Phase 3: Evolution

After accumulating failures, the evolution cycle analyzes patterns across failed trajectories, creates new skills to fill gaps, refines existing skills with updated instructions and confidence, and deprecates skills that consistently lead to poor outcomes.

Skill Organization

Skills follow the paper's dual-pool architecture:

  • Sg (General): Broadly applicable patterns — "Incremental Verification Loop", "Hypothesis-Driven Debugging"
  • Sk (Task-Specific): Domain-bound techniques — "React State Management", "SQL Query Optimization"

Quick Start

1. Configure Credentials

Gemini (default)

Get a free API key from Google AI Studio:

skillrl config YOUR_GEMINI_API_KEY

# Or environment variable
export GEMINI_API_KEY=your_api_key

Amazon Bedrock

npm install @aws-sdk/client-bedrock-runtime

export AWS_BEARER_TOKEN_BEDROCK=your_bedrock_api_key
export AWS_REGION=us-east-1

Claude (Anthropic)

export ANTHROPIC_API_KEY=your_api_key

2. Ingest a Trajectory

skillrl init                                     # Create empty skill bank
skillrl ingest conversation.jsonl                # Ingest Claude Code session
# Distilled 3 skills:
#   incremental_verification_loop [general] (confidence: 98%)
#   silent_failure_recovery [general] (confidence: 90%)
#   ...

skillrl index                                    # Build embedding index

3. Retrieve Skills

skillrl retrieve "implement JWT authentication with refresh tokens"
# Found 5 relevant skills:
#   auth_token_lifecycle (relevance: 74%)
#   ...

skillrl retrieve "fix race condition" --domain typescript

4. Export for Your IDE

skillrl export kiro-power --output ./power       # Kiro Power bundle
skillrl export skill-md --output SKILL.md        # Claude Code
skillrl export cursorrules --output .cursorrules  # Cursor

CLI Reference

Commands

Command Description
skillrl init Initialize an empty skill bank
skillrl ingest <file> Parse a trajectory and extract skills
skillrl ingest --stdin Read trajectory from stdin/pipe
skillrl index Build/rebuild the embedding index
skillrl retrieve "<task>" Find relevant skills by semantic search
skillrl evolve <file> Run evolution cycle on failed trajectories
skillrl export <format> Export skill bank (5 formats)
skillrl list Display all skills with metadata
skillrl stats Show skill bank statistics
skillrl import <file> Merge skills from another bank
skillrl config [api-key] Show or configure settings
skillrl test Test LLM provider connection

Options

Option Description
--provider, -p LLM provider: gemini (default), bedrock, claude
--model, -m Model alias or full ID (see Model Configuration)
--bank-path Custom skill bank path (default: .skillrl/bank.json)
--domain Filter by domain (e.g., typescript, react, python)
--source Trajectory source: claude-code, kiro, cursor, openclaw, custom
--output, -o Output path for exports
--verbose, -v Show detailed output

Trajectory Formats

Format Description Auto-Detected
JSONL Claude Code conversation logs Yes
JSON Structured { task, steps, outcome } object Yes
Kiro Section-delimited logs with --- markers Yes
Text Unstructured agent logs Yes

MCP Server

skillrl includes an MCP server with 8 tools for direct agent integration.

Setup

Add to your MCP configuration (~/.claude.json, project .mcp.json, or equivalent):

{
  "mcpServers": {
    "skillrl": {
      "command": "npx",
      "args": ["-y", "skillrl-mcp"],
      "env": {
        "GEMINI_API_KEY": "your_api_key"
      }
    }
  }
}

For Bedrock:

{
  "mcpServers": {
    "skillrl": {
      "command": "npx",
      "args": ["-y", "skillrl-mcp"],
      "env": {
        "RLM_PROVIDER": "bedrock",
        "AWS_BEARER_TOKEN_BEDROCK": "your_key",
        "AWS_REGION": "us-east-1"
      }
    }
  }
}

Tools

Tool Description
skill_ingest Parse trajectory and extract skills
skill_retrieve Semantic search for relevant skills
skill_evolve Run evolution from failed trajectories
skill_export Export to kiro-power, skill-md, cursorrules, markdown, or json
skill_list List skills with domain/category/confidence filters
skill_bank_stats Skill bank statistics
skill_config Current configuration
skill_index Build/rebuild embedding index

All tool inputs are validated with Zod schemas. The agent receives ranked skills with relevance scores, step-by-step instructions, and anti-patterns injected into its context.


Programmatic API

ESM-only ("type": "module"). Requires Node.js >= 18.

Factory Function

import { createSkillManager } from 'skillrl';

const manager = createSkillManager({
  provider: 'gemini',
  bankPath: '.skillrl/bank.json',
});

// Distill skills from a trajectory
const result = await manager.distiller.distill(trajectory);
console.log(`Extracted ${result.skills.length} skills`);

// Retrieve relevant skills
const retrieval = await manager.retriever.retrieve(
  'implement OAuth2 with PKCE flow',
  'typescript'
);
for (const { skill, relevanceScore } of retrieval.skills) {
  console.log(`${skill.name} (${(relevanceScore * 100).toFixed(0)}%)`);
}

// Evolve from failures
const evolution = await manager.evolver.evolve(failedTrajectories);
console.log(`New: ${evolution.newSkills.length}, Refined: ${evolution.refinedSkills.length}`);

Direct Class Usage

import { SkillBankManager, EmbeddingManager } from 'skillrl';

// Load the skill bank
const bank = new SkillBankManager({ bankPath: '.skillrl/bank.json' });
bank.load();
const skills = bank.listSkills();
console.log(`${skills.length} skills loaded`);

// Embedding index (SQLite + sqlite-vec)
const emb = new EmbeddingManager({ bankPath: '.skillrl/bank.json' });
await emb.load();

// KNN search with domain filtering
const results = await emb.search('handle authentication errors', {
  topK: 5,
  threshold: 0.3,
  domain: 'typescript',
});

for (const { skillId, score } of results) {
  console.log(`${skillId}: ${(score * 100).toFixed(1)}% match`);
}

emb.close();

Export

import { getExporter } from 'skillrl';

const exporter = getExporter('kiro-power');
const result = await exporter.export(bank.getBank(), {
  format: 'kiro-power',
  outputPath: './power',
  domain: 'typescript',
  minConfidence: 0.6,
});

Sub-Path Exports

import { resolveModelConfig } from 'skillrl/models';
import { getApiKey, detectProvider } from 'skillrl/config';
import type { LLMProvider } from 'skillrl/providers';

Embedding Index

The embedding index uses SQLite + sqlite-vec for hardware-accelerated KNN search.

Schema

CREATE TABLE index_metadata (
  key TEXT PRIMARY KEY, value TEXT NOT NULL
);

CREATE TABLE skill_metadata (
  skill_id TEXT PRIMARY KEY,
  text TEXT NOT NULL,
  domain TEXT NOT NULL DEFAULT 'general',
  model TEXT NOT NULL,
  dimensions INTEGER NOT NULL,
  updated_at TEXT NOT NULL
);
CREATE INDEX idx_skill_metadata_domain ON skill_metadata(domain);

CREATE VIRTUAL TABLE vec_skill_embeddings USING vec0(
  skill_id TEXT PRIMARY KEY,
  domain TEXT partition key,
  embedding float[256] distance_metric=cosine
);

How Search Works

  1. The query string is embedded via gemini-embedding-001 (one API call, 256-dimensional vector)
  2. The embedding is passed to sqlite-vec's MATCH operator with the requested k (top-K) and optional domain partition filter
  3. sqlite-vec performs approximate nearest neighbor search using cosine distance
  4. Results are converted: similarity = 1 - distance
-- With domain filter (searches only the "react" partition):
SELECT skill_id, distance FROM vec_skill_embeddings
WHERE embedding MATCH ?1 AND k = ?2 AND domain = 'react'
ORDER BY distance;

-- Without domain filter (searches all partitions):
SELECT skill_id, distance FROM vec_skill_embeddings
WHERE embedding MATCH ?1 AND k = ?2
ORDER BY distance;

Configuration

Setting Value
Embedding model gemini-embedding-001 (256 dimensions, float32)
Distance metric Cosine
Journal mode WAL (concurrent readers, single writer)
Sync mode NORMAL (acceptable for regenerable data)
Busy timeout 5000ms
Domain filtering Partition key (pre-filtered, not post-filtered)
Storage embeddings.db alongside bank.json

Migration

If an embeddings.json file exists from a previous version, it is automatically migrated to SQLite on the next load(), search(), or index call. The original file is renamed to embeddings.json.migrated.


Model Configuration

Gemini (Default)

Alias Model ID
fast, default, flash gemini-3-flash-preview
smart, pro gemini-3-pro-preview
flash-2 gemini-2.0-flash-exp
flash-2.5 gemini-2.5-flash

Amazon Bedrock

Alias Model ID
fast, default, nova-2-lite us.amazon.nova-2-lite-v1:0
smart, claude-4.5-sonnet us.anthropic.claude-sonnet-4-5-*
claude-4.5-opus us.anthropic.claude-opus-4-5-*
llama-4 us.meta.llama4-maverick-*

Bedrock requires @aws-sdk/client-bedrock-runtime (optional peer dependency).

Claude (Anthropic)

Alias Model ID
fast, haiku claude-haiku-4-5-20251001
smart, default, sonnet claude-sonnet-4-5-20250929
opus claude-opus-4-5-20251101

Usage

skillrl ingest session.jsonl --model fast
skillrl ingest session.jsonl --model smart
skillrl ingest session.jsonl --provider bedrock --model claude-4.5-sonnet

Export Formats

Kiro Power (kiro-power)

Full IDE integration bundle with directory structure:

power/
  POWER.md           # Activation manifest
  mcp.json           # MCP server configuration
  steering/          # Domain-grouped skill files
  hooks/             # Agent lifecycle hooks (auto-ingestion)

SKILL.md (skill-md)

YAML frontmatter + markdown. Compatible with Claude Code. Includes version, domains, skill count, generated_by metadata.

.cursorrules (cursorrules)

Native Cursor IDE rules format. Skills sorted by confidence, inline domain tags.

Markdown (markdown)

Human-readable documentation with table of contents, overview table, and skills grouped by domain.

JSON (json)

Raw skill bank for programmatic use, backup, and import into other banks.


Configuration

Credential Resolution Order

  1. Environment variables (GEMINI_API_KEY, AWS_BEARER_TOKEN_BEDROCK, ANTHROPIC_API_KEY)
  2. .env file in current directory
  3. .env.local file in current directory
  4. ~/.skillrl/.env
  5. ~/.config/skillrl/.env
  6. ~/.skillrl/config.json
  7. ~/.config/skillrl/config.json

Provider Detection

  1. RLM_PROVIDER environment variable
  2. provider field in config file
  3. Auto-detect based on available credentials (Gemini -> Bedrock -> Claude)
  4. Default: gemini

File Structure

.skillrl/
  bank.json              # Skill definitions and metadata
  embeddings.db          # SQLite vector index (auto-created by `skillrl index`)
  embeddings.db-wal      # WAL journal (auto-managed)
  embeddings.db-shm      # Shared memory (auto-managed)

Troubleshooting

"API key not configured"

skillrl config                 # Check current state
skillrl config YOUR_API_KEY    # Set Gemini key

"No skills found" on retrieve

Build the embedding index after ingesting:

skillrl stats    # Verify skills exist
skillrl index    # Build the index

"requires @aws-sdk/client-bedrock-runtime"

npm install @aws-sdk/client-bedrock-runtime

Slow first retrieval

The first search() call opens the SQLite database and loads the sqlite-vec extension (~5-20ms). Subsequent calls reuse the connection.

Embedding index size

For 500 skills with 256-dimension embeddings, expect ~2-5 MB. WAL and SHM files are temporary.


TypeScript Types

import type {
  Skill, SkillExample, SkillMetadata,
  SkillBank, SkillBankMetadata, SkillConfig,
  DistillationResult, RetrievalResult, EvolutionResult,
  ExportResult, ScoredSkill,
  Trajectory, TrajectoryStep, ToolCall,
  ExportOptions,
  SkillEmbedding, EmbeddingIndex,
  ProviderName, ResolvedModelConfig,
} from 'skillrl';

Skill Type

interface Skill {
  id: string;
  name: string;
  description: string;
  domain: string;                          // e.g., "typescript", "react", "python"
  category: 'general' | 'task-specific';
  instructions: string[];                  // Step-by-step
  prerequisites: string[];
  antiPatterns: string[];                  // What to avoid
  examples: { context: string; input: string; output: string }[];
  metadata: {
    usageCount: number;
    successRate: number;
    lastUsed: string | null;
    evolvedFrom: string | null;
    deprecated: boolean;
    deprecationReason: string | null;
  };
  tags: string[];
  confidence: number;                      // 0.0 - 1.0
  version: number;
  createdAt: string;
  updatedAt: string;
  sourceTrajectories: string[];
}

Security

  • API keys stored locally, transmitted only to configured LLM providers
  • Path traversal protection prevents reads/writes outside the sandbox
  • Zod validation on all MCP tool inputs with length limits
  • Output sanitization escapes markdown/YAML injection in exports
  • Read-only trajectory ingestion never modifies source code

License

MIT

Credits

Based on SkillRL: Skill-Based Transferable Reinforcement Learning for LLM Agents (arXiv:2602.08234v1).

Part of the RLM project.

Contributing

Contributions welcome. GitHub Issues | Discussions

About

Repo for rlmskill project based on the research done by Cornell university - SKILLRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors