Skip to main content
Version: 0.2.0

LLM Wiki Q&A

This demo has two flavours: a CLI / SQLite version that runs entirely through skardi-cli against a local file (no server, no Docker), and the server / PostgreSQL version below.


CLI version — skardi-cli + SQLite + sqlite-vec + FTS5

The same wiki primitives (create, update, get, grep, ls) run end-to-end through skardi query. A regular wiki_pages table holds canonical state (slug, title, page_type, content, embedding); AFTER INSERT / AFTER UPDATE triggers fan rows out to an FTS5 virtual table for keyword search and a sqlite-vec vec0 virtual table for KNN — so a single INSERT (or UPDATE) keeps content and embedding in sync. Embeddings are computed inline by the candle() UDF (same model as the server version).

1. Install the CLI with embedding support

cargo install --path crates/cli --features candle

2. Get the sqlite-vec extension

Build or download the vec0 shared library — see the sqlite-vec install guide. Then:

export SQLITE_VEC_PATH=/absolute/path/to/vec0.dylib   # or .so / .dll

# If using the pip package:
export SQLITE_VEC_PATH=$(python -c "import sqlite_vec; print(sqlite_vec.loadable_path())")

3. Download the embedding model

pip install huggingface_hub
python -c "
from huggingface_hub import hf_hub_download
import os
model_dir = 'models/generated/bge-small-en-v1.5'
os.makedirs(model_dir, exist_ok=True)
for f in ['model.safetensors', 'config.json', 'tokenizer.json']:
hf_hub_download('BAAI/bge-small-en-v1.5', f, local_dir=model_dir)
"

4. Create the database

pip install sqlite-vec
python demo/llm_wiki/setup.py

The script loads the sqlite-vec extension via the sqlite_vec Python package and drops any prior demo/llm_wiki/wiki.db. vec0 requires an INTEGER rowid, so the human-readable slug lives on the base wiki_pages table (and as an UNINDEXED FTS5 column) and the integer id carries the JOIN. The script also creates wiki_pages_fts, wiki_pages_vec, the wiki_log activity table, and AFTER INSERT / AFTER UPDATE triggers that keep both mirrors in sync. See setup.py for the schema.

5. Context file

One source in catalog mode auto-discovers every table, loads sqlite-vec once on the shared connection pool, and registers each table under <catalog>.main.<table> for both SQL and sqlite_knn / sqlite_fts lookups.

data_sources:
- name: wiki
type: sqlite
path: demo/llm_wiki/wiki.db
access_mode: read_write
hierarchy_level: catalog
options:
extensions_env: SQLITE_VEC_PATH

6. create — write a new page

skardi query --ctx demo/llm_wiki/cli-ctx.yaml --sql "
INSERT INTO wiki.main.wiki_pages (slug, title, page_type, content, embedding)
SELECT slug, title, page_type, content,
vec_to_binary(candle('models/generated/bge-small-en-v1.5', content))
FROM (
SELECT 'entity/alan-turing' AS slug, 'Alan Turing' AS title, 'entity' AS page_type,
'# Alan Turing\n\nBritish mathematician and logician who formalized the concepts of algorithm and computation with the Turing machine.' AS content
UNION ALL
SELECT 'concept/turing-machine', 'Turing Machine', 'concept',
'# Turing Machine\n\nAn abstract computational model introduced by Alan Turing in 1936.'
) AS t
"

Why UNION ALL of SELECTs instead of VALUES? DataFusion's INSERT planner currently propagates the INSERT target schema (here, 5 columns) down into any immediate-child VALUES clause and validates row width against it, ignoring the intermediate SELECT projection that adds vec_to_binary(candle(...)). Wrapping the seed rows as SELECT … UNION ALL SELECT … keeps the subquery's own schema in scope and the projection lands the row at full width.

candle() produces the embedding inline; vec_to_binary() packs it for vec0; the AFTER INSERT trigger atomically mirrors the row to wiki_pages_fts and wiki_pages_vec.

7. update — edit an existing page (delete + re-insert)

DataFusion's UPDATE planner unparses each SET expression back to SQL for the underlying SQLite connection to execute, and it can't currently render a Binary scalar (the packed-f32 embedding from vec_to_binary(candle(...))) as a SQL literal. The portable workaround is delete + re-insert in two statements — the AFTER DELETE trigger cleans both mirrors, the AFTER INSERT trigger repopulates them, and the new row picks up a fresh updated_at from the column default.

skardi query --ctx demo/llm_wiki/cli-ctx.yaml --sql "
DELETE FROM wiki.main.wiki_pages WHERE slug = 'entity/alan-turing'
"

skardi query --ctx demo/llm_wiki/cli-ctx.yaml --sql "
INSERT INTO wiki.main.wiki_pages (slug, title, page_type, content, embedding)
SELECT slug, title, page_type, content,
vec_to_binary(candle('models/generated/bge-small-en-v1.5', content))
FROM (
SELECT 'entity/alan-turing' AS slug, 'Alan Turing' AS title, 'entity' AS page_type,
'# Alan Turing\n\nBritish mathematician, logician, and cryptanalyst who broke the Enigma cipher at Bletchley Park.' AS content
) AS t
"

If you'd rather edit the row in place from a SQLite client (e.g. sqlite3), the original AFTER UPDATE trigger is still installed and will refresh both mirrors when an UPDATE wiki_pages SET ... runs against the underlying database directly — only the DataFusion path needs the delete-and-reinsert dance.

8. get — fetch one page by slug

skardi query --ctx demo/llm_wiki/cli-ctx.yaml --sql "
SELECT slug, title, page_type, content, updated_at
FROM wiki.main.wiki_pages WHERE slug = 'entity/alan-turing'
"

9. grep — hybrid search (RRF over FTS + vector)

skardi query --ctx demo/llm_wiki/cli-ctx.yaml --sql "
WITH vec AS (
SELECT id, ROW_NUMBER() OVER (ORDER BY _score ASC) AS rk
FROM sqlite_knn('wiki.main.wiki_pages_vec', 'embedding',
(SELECT candle('models/generated/bge-small-en-v1.5',
'who invented the theoretical model of a computer?')),
80)
),
fts AS (
SELECT id, slug, title, page_type,
ROW_NUMBER() OVER (ORDER BY _score DESC) AS rk
FROM sqlite_fts('wiki.main.wiki_pages_fts', 'content',
'turing machine computation', 60)
)
SELECT
COALESCE(f.slug, p.slug) AS slug,
COALESCE(f.title, p.title) AS title,
COALESCE(f.page_type, p.page_type) AS page_type,
COALESCE(0.5 / (60.0 + v.rk), 0)
+ COALESCE(0.5 / (60.0 + f.rk), 0) AS rrf_score
FROM vec v
FULL OUTER JOIN fts f ON v.id = f.id
LEFT JOIN wiki.main.wiki_pages p ON p.id = COALESCE(v.id, f.id)
ORDER BY rrf_score DESC
LIMIT 10
"

10. ls — browse by type or slug prefix

skardi query --ctx demo/llm_wiki/cli-ctx.yaml --sql "
SELECT slug, title, page_type, updated_at
FROM wiki.main.wiki_pages
WHERE page_type LIKE 'entity'
AND slug LIKE '%'
ORDER BY updated_at DESC
LIMIT 100
"

11. log — append an activity entry

skardi query --ctx demo/llm_wiki/cli-ctx.yaml --sql "
INSERT INTO wiki.main.wiki_log (event_type, slug, message)
VALUES ('ingest', 'entity/alan-turing', 'Created from Wikipedia article.')
"

Cleanup

rm demo/llm_wiki/wiki.db

Server version — skardi-server + PostgreSQL + pgvector

A data layer for Karpathy's LLM Wiki idea, built on Skardi. The wiki is stored in a single PostgreSQL table that holds markdown content + pgvector embedding on the same row, so every page is atomically searchable via both full-text search and semantic vector search.

Rather than chunked RAG, the LLM agent maintains a persistent, compounding knowledge artifact — entity pages, concept pages, summaries, an index — and uses the endpoints below as its file-system-like primitives (open, write, grep, ls, log).

  • Vector search — candle (bge-small-en-v1.5) embeddings + pgvector KNN (pg_knn)
  • Full-text search — PostgreSQL tsvector / websearch_to_tsquery (pg_fts)
  • Hybrid search — RRF (Reciprocal Rank Fusion) merging both results in SQL
  • Atomic editswiki-create (INSERT) and wiki-update (UPDATE) both re-embed the page inline with candle() in a single statement, so content and vector stay in sync
                    ┌─────────────────────────────────┐
│ Write Path │
│ │
markdown ───────► │ INSERT (wiki-create) or │
(by slug) │ UPDATE (wiki-update) │
│ SET content, │
│ embedding = candle(...) │
│ │
│ ─► row is now visible to │
│ both pg_fts and pg_knn │
└─────────────────────────────────┘

┌─────────────────────────────────┐
│ Read Path │
│ │
query ──────────► │ pg_knn() (top 80) │──┐
│ pg_fts() (top 60) │──┤ RRF merge
│ │ │
│ FULL OUTER JOIN ON slug + RRF │◄─┘
│ ORDER BY rrf_score DESC │
└─────────────────────────────────┘

Because content and embedding live on the same row, a single upsert keeps FTS and vector in sync — no second store, no cross-store consistency problem.

Quick Start

1. Start PostgreSQL with pgvector

docker run --name wiki-postgres \
-e POSTGRES_DB=wikidb \
-e POSTGRES_USER=skardi_user \
-e POSTGRES_PASSWORD=skardi_pass \
-p 5432:5432 \
-d pgvector/pgvector:pg16

2. Create the schema and indexes

docker exec -i wiki-postgres psql -U skardi_user -d wikidb << 'EOF'
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE wiki_pages (
slug TEXT PRIMARY KEY, -- e.g. "entity/alan-turing"
title TEXT NOT NULL,
page_type TEXT NOT NULL, -- entity | concept | summary | index | schema
content TEXT NOT NULL, -- markdown body
embedding vector(384), -- bge-small-en-v1.5 dimension
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX ON wiki_pages USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

CREATE INDEX wiki_pages_content_fts_idx
ON wiki_pages
USING GIN (to_tsvector('english', content));

CREATE INDEX wiki_pages_type_idx ON wiki_pages (page_type, updated_at DESC);

CREATE TABLE wiki_log (
id BIGSERIAL PRIMARY KEY,
event_type TEXT NOT NULL, -- ingest | query | lint | note
slug TEXT NOT NULL, -- "" if not page-specific
message TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
EOF

3. Download the embedding model

pip install huggingface_hub

python -c "
from huggingface_hub import hf_hub_download
import os
model_dir = 'models/generated/bge-small-en-v1.5'
os.makedirs(model_dir, exist_ok=True)
for f in ['model.safetensors', 'config.json', 'tokenizer.json']:
hf_hub_download('BAAI/bge-small-en-v1.5', f, local_dir=model_dir)
print(f'Model downloaded to {model_dir}')
"

4. Set credentials and start the server

export PG_USER="skardi_user"
export PG_PASSWORD="skardi_pass"

cargo run --bin skardi-server --features candle -- \
--ctx demo/llm_wiki/ctx.yaml \
--pipeline demo/llm_wiki/pipelines/ \
--port 8080

Agent Primitives

The wiki exposes five HTTP endpoints that mirror the verbs an LLM agent needs to maintain a compounding knowledge base. Each one corresponds to one pipeline file under pipelines/.

EndpointVerbPipeline
/wiki-create/executewrite (new)pipelines/create.yaml
/wiki-update/executewrite (edit)pipelines/update.yaml
/wiki-get/executeopenpipelines/get.yaml
/wiki-search-hybrid/executegreppipelines/search_hybrid.yaml
/wiki-list/executelspipelines/list.yaml
/wiki-log-append/executelogpipelines/log_append.yaml

DataFusion's SQL planner does not support INSERT ... ON CONFLICT, so create and edit are exposed as two explicit endpoints. The agent's pattern is: try wiki-update first; if it affects zero rows, fall back to wiki-create. Both re-embed the page inline in a single statement, so FTS and vector stay consistent either way.


Write Path: Creating and Editing Pages

Both endpoints re-embed the page inline with candle() in a single SQL statement, so the pgvector column and the FTS content column are written together from the same row.

Create a new page

# Create an entity page
curl -X POST http://localhost:8080/wiki-create/execute \
-H "Content-Type: application/json" \
-d '{
"slug": "entity/alan-turing",
"title": "Alan Turing",
"page_type": "entity",
"content": "# Alan Turing\n\nBritish mathematician and logician who formalized the concepts of algorithm and computation with the Turing machine. Considered a founder of theoretical computer science and artificial intelligence."
}' | jq .

# Create a concept page that references it
curl -X POST http://localhost:8080/wiki-create/execute \
-H "Content-Type: application/json" \
-d '{
"slug": "concept/turing-machine",
"title": "Turing Machine",
"page_type": "concept",
"content": "# Turing Machine\n\nAn abstract computational model introduced by Alan Turing in 1936. Consists of an infinite tape, a head, and a finite state machine; captures the notion of effective computability."
}' | jq .

# Create a summary page
curl -X POST http://localhost:8080/wiki-create/execute \
-H "Content-Type: application/json" \
-d '{
"slug": "summary/foundations-of-computation",
"title": "Foundations of Computation",
"page_type": "summary",
"content": "# Foundations of Computation\n\nThe theoretical basis for modern computing emerged in the 1930s through the work of Church, Turing, and Gödel. The Church–Turing thesis unified lambda calculus and Turing machines as equivalent models of computation."
}' | jq .

Edit an existing page

wiki-update rewrites the row in place and refreshes the embedding, so an edit is atomically reflected in both pg_fts and pg_knn.

curl -X POST http://localhost:8080/wiki-update/execute \
-H "Content-Type: application/json" \
-d '{
"slug": "entity/alan-turing",
"title": "Alan Turing",
"page_type": "entity",
"content": "# Alan Turing\n\nBritish mathematician, logician, and cryptanalyst. Formalized algorithm and computation via the Turing machine; led the Bletchley Park team that broke the Enigma cipher during WWII."
}' | jq .

Under the hood the two pipelines are:

-- wiki-create
INSERT INTO wikidb.public.wiki_pages (slug, title, page_type, content, embedding, updated_at)
VALUES (
{slug}, {title}, {page_type}, {content},
candle('models/generated/bge-small-en-v1.5', {content}),
now()
);

-- wiki-update
UPDATE wikidb.public.wiki_pages
SET title = {title},
page_type = {page_type},
content = {content},
embedding = candle('models/generated/bge-small-en-v1.5', {content}),
updated_at = now()
WHERE slug = {slug};

DataFusion's planner does not support INSERT ... ON CONFLICT, which is why create and update are separate endpoints. The agent pattern is: try wiki-update first; if it reports zero rows affected, fall back to wiki-create.


Read Path: Agent Retrieval Loop

A typical LLM agent turn looks like:

  1. grep the wiki with hybrid search to find candidate slugs
  2. open each top-ranked page with wiki-get to read the full body
  3. Synthesize the answer, optionally write new pages back
  4. log the activity so the next session knows what happened
curl -X POST http://localhost:8080/wiki-search-hybrid/execute \
-H "Content-Type: application/json" \
-d '{
"query": "who invented the theoretical model of a computer?",
"text_query": "turing machine computation",
"vector_weight": 0.5,
"text_weight": 0.5,
"limit": 10
}' | jq .

Returns slug, title, page_type, and rrf_score for each candidate page. The RRF join is on slug (the wiki's primary key), so there is no cross-store lookup and no id-type conversion.

open — fetch a full page

curl -X POST http://localhost:8080/wiki-get/execute \
-H "Content-Type: application/json" \
-d '{"slug": "entity/alan-turing"}' | jq .

ls — browse by type or prefix

Rebuild index.md, find orphan pages, or list a category:

# All entity pages, newest first
curl -X POST http://localhost:8080/wiki-list/execute \
-H "Content-Type: application/json" \
-d '{
"page_type_pattern": "entity",
"slug_prefix": "%",
"limit": 100
}' | jq .

# Everything under concept/
curl -X POST http://localhost:8080/wiki-list/execute \
-H "Content-Type: application/json" \
-d '{
"page_type_pattern": "%",
"slug_prefix": "concept/%",
"limit": 100
}' | jq .

log — append an activity entry

curl -X POST http://localhost:8080/wiki-log-append/execute \
-H "Content-Type: application/json" \
-d '{
"event_type": "ingest",
"slug": "entity/alan-turing",
"message": "Created from Wikipedia article; cross-linked from concept/turing-machine."
}' | jq .

Pipelines

PipelineEndpointDescription
create.yaml/wiki-create/executeINSERT a new page; re-embeds with candle() inline
update.yaml/wiki-update/executeUPDATE an existing page by slug; re-embeds with candle() inline
get.yaml/wiki-get/executeFetch one page by slug
search_hybrid.yaml/wiki-search-hybrid/executeRRF hybrid search over pg_knn + pg_fts
list.yaml/wiki-list/executeFilter pages by page_type + slug prefix, newest first
log_append.yaml/wiki-log-append/executeAppend to the wiki_log activity log

Relationship to the RAG Demo

This demo is a schema-level evolution of demo/rag/: same stack (Postgres + pgvector + candle + pg_fts + pg_knn + RRF), but the table is keyed by a human-readable slug, carries page metadata (title, page_type), and uses INSERT ... ON CONFLICT so pages can be edited in place. The RAG demo ingests immutable chunks; the LLM Wiki demo ingests a living, editable knowledge base that the LLM itself curates.


Cleanup

docker stop wiki-postgres && docker rm wiki-postgres
pkill -f skardi-server