Roadmap
We're building in public. [x] means shipped today, [ ] means open for contribution. Open an issue or hop into Discord on anything unchecked.
1 Federated SQL engine
- DataFusion single-node federation across CSV, Parquet, JSON, S3 / GCS / Azure, Postgres, MySQL, SQLite, MongoDB, Redis, Iceberg, Lance, SeekDB — all joinable in one query
- Register by table, or load an entire DB (Postgres / MySQL / SQLite) as a DataFusion catalog — one config line either way
- Graph database sources (Neo4j / Kuzu) — native federation to unlock graphRAG patterns alongside vector / FTS retrieval
2 Retrieval primitives
- Vector search —
pg_knn(pgvector),sqlite_knn(sqlite-vec), Lance KNN, SeekDB HNSW - Full-text search —
pg_fts,sqlite_fts, Lance BM25 inverted indexes, SeekDB FULLTEXT - Hybrid search — RRF merge of FTS + KNN in plain SQL
- Inline embeddings —
candle()UDF (GGUF / Candle / remote embed APIs) runs directly inside SQL; content + vector stay on the same row atomically - ONNX inference —
onnx_predictUDF for inline model predictions in SQL - Chunking UDF —
chunk()with character / markdown splitters (viatext-splitter) so ingestion can chunk inline in SQL (docs); token / code splitters next - Memory primitive — hybrid access + TTL + provenance + consolidation collapsed into one declarative macro
3 Online serving (pipelines)
- Declarative YAML → parameterized REST endpoint with inferred request / response schema
- Built-in pipeline dashboard
- CLI pipeline binding + aliases —
skardi run <pipeline> --param=…and user-defined verb aliases (#90) - CLI federated SQL —
skardi queryagainst files, object stores, datalake formats, and databases with no server required
4 Offline jobs
- Async batch execution with submit / poll / cancel (#98)
- Lance dataset destinations with atomic commit + crash recovery
- SQL-DML destinations (Postgres / MySQL / SQLite)
- SQLite-backed run ledger with submit-time schema diff
5 Agent-facing bindings
- REST — every pipeline served as a parameterized HTTP endpoint
- Shell — every pipeline runnable as a
skardicommand; works in Claude Code, Cursor, and any agent with a Bash tool - Skills generator —
skardi skills generate --ctx <ctx.yaml> --out .claude/skills/emits a skill Markdown per pipeline for Claude Code / Desktop auto-discovery - MCP binding — same pipeline YAML projected to MCP tools for non-Claude hosts
6 Governance & lineage
- Catalog with semantics —
kind: semanticsYAML overlay attaching NL descriptions to tables / columns; supports both bare source names and fully-qualifiedcatalog.schema.tablepaths for per-table targeting on catalog-mode sources; surfaced onGET /data_sourcefor agent-side discovery - Agent-callable
describeverb — CLI / pipeline form on top of the catalog endpoint - Lineage capture —
agent_id,session_id,tool_call_id,timestampon writes; queryable from metadata tables - Agent identity passthrough — any binding injects client identity into a SQL context var pipelines can read
- Snapshot-as-branch / agent checkpoints — Iceberg / Lance-backed;
git checkout-like semantics for destructive agent experiments
7 Ops
- Session auth — drop-in user auth via better-auth backed by SQLite
- Observability — OpenTelemetry traces / metrics / logs with a pre-configured Grafana stack
- Docker + pre-built binaries — Linux x86_64 / ARM64, macOS ARM64