Version: 0.4.0

Roadmap

We're building in public. [x] means shipped today, [ ] means open for contribution. Open an issue or hop into Discord on anything unchecked.

1 Federated SQL engine

DataFusion single-node federation across CSV, Parquet, JSON, S3 / GCS / Azure, Postgres, MySQL, SQLite, MongoDB, Redis, Iceberg, Lance, SeekDB — all joinable in one query
Register by table, or load an entire DB (Postgres / MySQL / SQLite) as a DataFusion catalog — one config line either way
Graph database sources (Neo4j / Kuzu) — native federation to unlock graphRAG patterns alongside vector / FTS retrieval

2 Retrieval primitives

Vector search — pg_knn (pgvector), sqlite_knn (sqlite-vec), Lance KNN, SeekDB HNSW
Full-text search — pg_fts, sqlite_fts, Lance BM25 inverted indexes, SeekDB FULLTEXT
Hybrid search — RRF merge of FTS + KNN in plain SQL
Inline embeddings — candle() UDF (GGUF / Candle / remote embed APIs) runs directly inside SQL; content + vector stay on the same row atomically
ONNX inference — onnx_predict UDF for inline model predictions in SQL
Chunking UDF — chunk() with character / markdown splitters (via text-splitter) so ingestion can chunk inline in SQL (docs); token / code splitters next
Memory primitive — hybrid access + TTL + provenance + consolidation collapsed into one declarative macro

3 Online serving (pipelines)

Declarative YAML → parameterized REST endpoint with inferred request / response schema
Built-in pipeline dashboard
CLI pipeline binding + aliases — skardi run <pipeline> --param=… and user-defined verb aliases (#90)
CLI federated SQL — skardi query against files, object stores, datalake formats, and databases with no server required

4 Offline jobs

5 Agent-facing bindings

REST — every pipeline served as a parameterized HTTP endpoint
Shell — every pipeline runnable as a skardi command; works in Claude Code, Cursor, and any agent with a Bash tool
Skills generator — skardi skills generate --ctx <ctx.yaml> --out .claude/skills/ emits a skill Markdown per pipeline for Claude Code / Desktop auto-discovery
MCP binding — same pipeline YAML projected to MCP tools for non-Claude hosts

6 Governance & lineage

Catalog with semantics — kind: semantics YAML overlay attaching NL descriptions to tables / columns; supports both bare source names and fully-qualified catalog.schema.table paths for per-table targeting on catalog-mode sources; surfaced on GET /data_source for agent-side discovery
Agent-callable describe verb — CLI / pipeline form on top of the catalog endpoint
Lineage capture — agent_id, session_id, tool_call_id, timestamp on writes; queryable from metadata tables
Agent identity passthrough — any binding injects client identity into a SQL context var pipelines can read
Snapshot-as-branch / agent checkpoints — Iceberg / Lance-backed; git checkout-like semantics for destructive agent experiments

7 Ops

Session auth — drop-in user auth via better-auth backed by SQLite
Observability — OpenTelemetry traces / metrics / logs with a pre-configured Grafana stack
Docker + pre-built binaries — Linux x86_64 / ARM64, macOS ARM64