Remote APIs
Semantic search over a small knowledge base using OpenAI's
text-embedding-3-small model via Skardi's remote_embed() UDF.
Prerequisites
- OpenAI API key — set the environment variable:
export OPENAI_API_KEY=sk-... - Python dependencies (for the one-time setup script):
pip install openai lancedb pyarrow - Build Skardi with the
remote-embedfeature:cargo build --bin skardi-server --features remote-embed
Setup
Run from the project root:
python docs/embeddings/remote/setup_remote.py
This will:
- Load
docs/embeddings/data/docs.csv(15 short knowledge-base articles) - Embed every document with OpenAI
text-embedding-3-small(1536-dim) - Write a Lance dataset to
data/generated/doc_embeddings_openai.lance
Start the server
cargo run --bin skardi-server --features remote-embed -- \
--ctx docs/embeddings/remote/ctx.yaml \
--pipeline docs/embeddings/remote/pipelines/ \
--port 8080
Query
curl -s "http://localhost:8080/semantic-search-remote/execute" \
-H 'Content-Type: application/json' \
-d '{"query": "how does semantic search work?", "k": 10}' | jq .
Response:
{
"success": true,
"data": [
{
"id": 9,
"title": "Semantic Search",
"content": "Semantic search retrieves documents based on meaning rather than keyword overlap. A query and all documents are embedded into the same vector space; the most semantically similar documents are returned via nearest-neighbour search. This handles synonyms and paraphrases that would confuse keyword search.",
"_distance": 0.62793195
},
{
"id": 15,
"title": "OpenAI Embeddings",
"content": "OpenAI's text-embedding-3-small model produces 1536-dimensional vectors optimised for semantic similarity and retrieval. It supports shortening via the dimensions parameter and is trained on diverse text data. The model is accessed via the /v1/embeddings REST API with an API key.",
"_distance": 1.2529761
},
{
"id": 2,
"title": "Retrieval-Augmented Generation",
"content": "Retrieval-Augmented Generation (RAG) combines a retrieval step with a language model. A query is embedded into a vector and used to fetch relevant documents from a vector store. Those documents are passed as context to an LLM",
"_distance": 1.2652609
},
{
"id": 4,
"title": "BERT Embeddings",
"content": "BERT (Bidirectional Encoder Representations from Transformers) produces contextual embeddings by reading text in both directions. Fine-tuned variants like bge-small and all-MiniLM are commonly used for semantic similarity tasks. The [CLS] token or mean-pooled hidden states are used as sentence-level embeddings.",
"_distance": 1.2673755
},
{
"id": 11,
"title": "Approximate Nearest Neighbour Search",
"content": "Exact nearest-neighbour search scales as O(n) per query. ANN algorithms like HNSW and IVF-PQ trade a small accuracy loss for sub-linear query times",
"_distance": 1.3190017
},
{
"id": 1,
"title": "Vector Databases",
"content": "Vector databases store high-dimensional numerical vectors and enable fast similarity search at scale. Unlike traditional databases that match exact values",
"_distance": 1.3902473
},
{
"id": 12,
"title": "Remote Embeddings",
"content": "Remote embedding APIs let you generate high-quality vector embeddings without downloading or hosting models locally. Providers like OpenAI",
"_distance": 1.5463313
},
{
"id": 7,
"title": "DataFusion Query Engine",
"content": "DataFusion is an in-process SQL query engine written in Rust",
"_distance": 1.5478841
},
{
"id": 3,
"title": "Transformer Architecture",
"content": "The Transformer architecture introduced multi-head self-attention to replace recurrent networks. Each token attends to all other tokens in the sequence",
"_distance": 1.5975192
},
{
"id": 13,
"title": "Mean Pooling",
"content": "Mean pooling aggregates the per-token hidden states from a transformer encoder into a single fixed-size vector. Each token embedding is averaged across the sequence length",
"_distance": 1.5982214
}
],
"rows": 10,
"execution_time_ms": 1733,
"timestamp": "2026-04-08T17:16:42.575794+00:00"
}
The pipeline runs:
SELECT id, title, content, _distance
FROM lance_knn(
'doc_embeddings_openai',
'embedding',
remote_embed('openai', 'text-embedding-3-small', {query}),
10
)
ORDER BY _distance
LIMIT 10
remote_embed() calls the OpenAI API to embed the user query at request
time; lance_knn() finds the nearest documents in the pre-built Lance index.
Switching providers
The remote_embed() UDF supports four providers out of the box. To use a
different one, change the provider and model in the pipeline SQL and re-run
the setup script with the corresponding embedding API:
| Provider | Example model | Env var |
|---|---|---|
openai | text-embedding-3-small | OPENAI_API_KEY |
gemini | text-embedding-004 | GEMINI_API_KEY |
voyage | voyage-3 | VOYAGE_API_KEY |
mistral | mistral-embed | MISTRAL_API_KEY |