2026-04-29 09:11:46 +00:00
2026-04-29 09:11:46 +00:00
2026-04-29 09:11:46 +00:00
2026-04-29 08:17:35 +00:00
2026-04-29 08:17:35 +00:00
2026-04-29 08:17:35 +00:00
2026-04-29 08:17:35 +00:00
2026-04-29 09:11:46 +00:00
2026-04-29 09:11:46 +00:00
2026-04-29 09:11:46 +00:00

litellm-vector-store

A vector store service built on top of LiteLLM and pgvector, providing an OpenAI-compatible API for semantic search, document storage and Retrieval Augmented Generation (RAG).

Features

  • 🔐 Authentication via LiteLLM API Keys
  • 🗄️ Vector Store powered by PostgreSQL + pgvector
  • 🔍 Semantic Search with optional Reranking
  • 🤖 RAG Endpoint - Search + LLM in one request
  • 📄 File Upload - PDF, DOCX, TXT, Markdown, Excel, CSV, PowerPoint, HTML, E-Mail, JSON
  • 🖼️ Image Support - Upload images via Vision LLM (JPG, PNG, GIF, WebP, TIFF)
  • 🧩 OpenAI-compatible API - works with existing OpenAI SDKs
  • 👥 Multi-User - Store permissions per user
  • 🖥️ Admin UI - Manage users, stores and permissions
  • 📊 Usage Tracking - Track requests per user

Architecture

Client (API Key)
      │
      ▼
LiteLLM Proxy ──────────────────────────────┐
      │                                     │
      ▼                                     ▼
Vector Store API                    LiteLLM Models
      │                          ┌──────────────────┐
      ▼                          │ Embedding Models  │
PostgreSQL + pgvector             │ Vision Models     │
                                  │ LLM Models        │
                                  └──────────────────┘

Requirements

  • Kubernetes Cluster
  • PostgreSQL with pgvector extension (already deployed)
  • LiteLLM Proxy (already deployed)
  • Container Registry

Quick Start

1. Clone Repository

git clone https://github.com/your-org/litellm-vector-store.git
cd litellm-vector-store

2. Database Setup

kubectl exec -it <postgres-pod> -n <namespace> \
  -- psql -U postgres -d vectordb << 'EOF'

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE IF NOT EXISTS vector_stores (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name          VARCHAR(255) NOT NULL,
    owner_user_id VARCHAR(255) NOT NULL,
    created_at    TIMESTAMP DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS documents (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    store_id   UUID REFERENCES vector_stores(id) ON DELETE CASCADE,
    content    TEXT NOT NULL,
    metadata   JSONB DEFAULT '{}',
    embedding  vector(1024),
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS store_permissions (
    store_id   UUID REFERENCES vector_stores(id) ON DELETE CASCADE,
    user_id    VARCHAR(255) NOT NULL,
    permission VARCHAR(50) DEFAULT 'read',
    PRIMARY KEY (store_id, user_id)
);

CREATE TABLE IF NOT EXISTS usage_stats (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id    VARCHAR(255) NOT NULL,
    store_id   UUID REFERENCES vector_stores(id) ON DELETE SET NULL,
    action     VARCHAR(50) NOT NULL,
    tokens     INT DEFAULT 0,
    duration   FLOAT DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_documents_store
    ON documents(store_id);
CREATE INDEX IF NOT EXISTS idx_documents_embedding
    ON documents USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);
CREATE INDEX IF NOT EXISTS idx_usage_user
    ON usage_stats(user_id);
CREATE INDEX IF NOT EXISTS idx_usage_created
    ON usage_stats(created_at);

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO vecuser;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO vecuser;

EOF

3. Configure

# Create secrets
kubectl create secret generic vector-api-secrets \
  --namespace vector-store \
  --from-literal=DATABASE_URL="postgresql://vecuser:pass@postgres:5432/vectordb" \
  --from-literal=LITELLM_MASTER_KEY="sk-master-key"
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: vector-store-config
  namespace: vector-store
data:
  LITELLM_PROXY_URL: "http://litellm.<namespace>.svc.cluster.local:4000"
  ADMIN_USER_IDS:    "your-admin-user-id"
  API_URL:           "https://api.your-domain.com"
  EMBEDDING_MODEL:   "your-embedding-model"
  VISION_MODEL:      "openai/gpt-4o-mini"

4. Build & Deploy

# Build & push API
docker build -t your-registry/vector-store-api:1.0.0 .
docker push your-registry/vector-store-api:1.0.0

# Build & push Admin UI
docker build \
  -t your-registry/vector-store-admin:1.0.0 \
  ./ui
docker push your-registry/vector-store-admin:1.0.0

# Deploy
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secrets.yaml
kubectl apply -f k8s/vector-api/
kubectl apply -f k8s/admin-ui/
kubectl apply -f k8s/ingress-api.yaml
kubectl apply -f k8s/ingress-ui.yaml

Project Structure

litellm-vector-store/
├── app/                          # FastAPI Backend
│   ├── main.py                   # Application entry point
│   ├── auth.py                   # LiteLLM authentication
│   ├── database.py               # PostgreSQL connection
│   ├── models.py                 # Pydantic models
│   ├── routers/
│   │   ├── stores.py             # Vector store CRUD
│   │   ├── documents.py          # Document management
│   │   ├── admin.py              # Admin endpoints
│   │   └── openai_compat.py      # OpenAI-compatible API
│   └── utils/
│       ├── chunking.py           # Text chunking
│       ├── image_processor.py    # Vision LLM integration
│       └── stats.py              # Usage tracking
├── ui/                           # React Admin UI
│   ├── src/
│   │   ├── pages/
│   │   │   ├── Login.tsx
│   │   │   ├── Dashboard.tsx
│   │   │   ├── Users.tsx
│   │   │   └── Stores.tsx
│   │   ├── components/
│   │   │   ├── Layout.tsx
│   │   │   └── PermissionModal.tsx
│   │   └── api/
│   │       └── client.ts
│   └── Dockerfile
├── k8s/                          # Kubernetes manifests
│   ├── namespace.yaml
│   ├── configmap.yaml
│   ├── secrets.yaml
│   ├── vector-api/
│   │   ├── deployment.yaml
│   │   └── service.yaml
│   ├── admin-ui/
│   │   ├── deployment.yaml
│   │   └── service.yaml
│   ├── ingress-api.yaml
│   └── ingress-ui.yaml
├── scripts/
│   └── init.sql                  # Database initialization
├── Dockerfile
├── requirements.txt
└── README.md

API Reference

Base URL

https://api.your-domain.com/v1

Authentication

Authorization: Bearer sk-your-api-key

Endpoints

Method Endpoint Description
GET /v1/models List all models
GET /v1/embeddings/models List embedding models
GET /v1/vision/models List vision models
POST /v1/embeddings Create embeddings
POST /v1/vector_stores Create store
GET /v1/vector_stores List stores
GET /v1/vector_stores/{id} Get store
DELETE /v1/vector_stores/{id} Delete store
POST /v1/vector_stores/{id}/files Add texts
GET /v1/vector_stores/{id}/files List files
DELETE /v1/vector_stores/{id}/files/{file_id} Delete file
POST /v1/vector_stores/{id}/upload Upload file or image
POST /v1/vector_stores/{id}/search Semantic search
POST /v1/vector_stores/{id}/rag RAG query

Examples

Store anlegen & Datei hochladen

import httpx

client = httpx.Client(
    base_url="https://api.your-domain.com/v1",
    headers={"Authorization": "Bearer sk-your-key"},
    timeout=120.0
)

# Create store
store = client.post(
    "/vector_stores",
    json={"name": "My Knowledge Base"}
).json()

# Upload document
with open("document.pdf", "rb") as f:
    client.post(
        f"/vector_stores/{store['id']}/upload",
        files={"file": f}
    )

# Upload image (with default vision model)
with open("screenshot.png", "rb") as f:
    client.post(
        f"/vector_stores/{store['id']}/upload",
        files={"file": f}
    )

# Upload image (with custom vision model)
with open("diagram.png", "rb") as f:
    client.post(
        f"/vector_stores/{store['id']}/upload",
        files={"file": f},
        data={
            "vision_model":  "openai/gpt-4o",
            "vision_prompt": "Explain this diagram in detail."
        }
    )

# Search
results = client.post(
    f"/vector_stores/{store['id']}/search",
    json={
        "query":  "What is FastAPI?",
        "top_k":  3,
        "rerank": True
    }
).json()

# RAG
answer = client.post(
    f"/vector_stores/{store['id']}/rag",
    json={
        "query":  "What is FastAPI?",
        "model":  "openai/gpt-4o-mini",
        "rerank": True
    }
).json()
print(answer["answer"])

JavaScript / TypeScript

const API_KEY  = "sk-your-api-key";
const BASE_URL = "https://api.your-domain.com/v1";
const HEADERS  = {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type":  "application/json"
};

// Create store
const store = await fetch(`${BASE_URL}/vector_stores`, {
    method:  "POST",
    headers: HEADERS,
    body:    JSON.stringify({ name: "My Store" })
}).then(r => r.json());

// Search
const results = await fetch(
    `${BASE_URL}/vector_stores/${store.id}/search`, {
    method:  "POST",
    headers: HEADERS,
    body:    JSON.stringify({
        query:  "What is FastAPI?",
        top_k:  3,
        rerank: true
    })
}).then(r => r.json());

// RAG
const answer = await fetch(
    `${BASE_URL}/vector_stores/${store.id}/rag`, {
    method:  "POST",
    headers: HEADERS,
    body:    JSON.stringify({
        query: "What is FastAPI?"
    })
}).then(r => r.json());

console.log(answer.answer);

curl

# Create store
curl -X POST https://api.your-domain.com/v1/vector_stores \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{"name": "My Store"}'

# Upload document
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/upload \
  -H "Authorization: Bearer sk-your-key" \
  -F "file=@document.pdf"

# Upload image with custom vision model
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/upload \
  -H "Authorization: Bearer sk-your-key" \
  -F "file=@diagram.png" \
  -F "vision_model=openai/gpt-4o" \
  -F "vision_prompt=Explain this diagram in detail."

# Search
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/search \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is FastAPI?", "top_k": 3, "rerank": true}'

# RAG
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/rag \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is FastAPI?", "model": "openai/gpt-4o-mini"}'

Configuration Reference

Environment Variables

Variable Required Default Description
DATABASE_URL PostgreSQL connection URL
LITELLM_PROXY_URL LiteLLM proxy URL
LITELLM_MASTER_KEY LiteLLM master key
ADMIN_USER_IDS Comma-separated admin user IDs
EMBEDDING_MODEL text-embedding-ada-002 Default embedding model
VISION_MODEL openai/gpt-4o-mini Default vision model

Upload Parameters

Parameter Type Default Description
file file File to upload
chunk_size int 512 Characters per chunk
chunk_overlap int 50 Overlap between chunks
vision_model string Config default Vision model for images
vision_prompt string Auto Custom prompt for vision model

Search Parameters

Parameter Type Default Description
query string Search query
top_k int 5 Number of results (max. 50)
rerank bool false Enable reranking
rerank_model string Auto Custom rerank model

RAG Parameters

Parameter Type Default Description
query string Question
model string cosair/gemma4:31b LLM model
top_k int 5 Context documents
rerank bool false Enable reranking
system_prompt string Auto Custom system prompt
messages array [] Chat history

Supported File Formats

Format Extension Notes
Text .txt UTF-8 encoded
Markdown .md Standard Markdown
PDF .pdf Text PDFs only, no scans
Word .docx Microsoft Word 2007+
Excel .xlsx All sheets extracted
CSV .csv All columns extracted
PowerPoint .pptx All slides extracted
HTML .html .htm Scripts/styles removed
Outlook Mail .msg Including headers
E-Mail .eml Including headers
JSON .json Pretty printed
Image .jpg .jpeg .png .gif .webp .tiff Via Vision LLM

Limits

Limit Value
Max file size 256 MB
Max search results 50
Request timeout 600 seconds
Default chunk size 512 characters
Default chunk overlap 50 characters

Admin UI

The Admin UI is available at https://admin.your-domain.com.

Login with your Admin API Key to:

  • 📊 View usage statistics
  • 👥 Manage users and their stores
  • 🔑 Rotate API keys
  • 🔒 Grant/revoke store permissions

Development

# Install dependencies
pip install -r requirements.txt

# Run locally
DATABASE_URL="postgresql://..." \
LITELLM_PROXY_URL="http://..." \
LITELLM_MASTER_KEY="sk-..." \
ADMIN_USER_IDS="your-id" \
EMBEDDING_MODEL="your-model" \
VISION_MODEL="openai/gpt-4o-mini" \
uvicorn app.main:app --reload

# Run UI locally
cd ui
npm install
VITE_API_URL=http://localhost:8000 npm run dev

Tech Stack

Component Technology
API FastAPI + Python 3.12
Database PostgreSQL 16 + pgvector
Auth LiteLLM Key Management
Embeddings Via LiteLLM Proxy
Vision Via LiteLLM Vision Models
Admin UI React + TypeScript + Tailwind CSS
Container Docker + Kubernetes
Ingress NGINX Ingress Controller
TLS cert-manager + Let's Encrypt

License

MIT License - see LICENSE for details.

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/my-feature)
  3. Commit your changes (git commit -m 'Add my feature')
  4. Push to the branch (git push origin feature/my-feature)
  5. Open a Pull Request
Description
No description provided
Readme 76 KiB
Languages
Python 62.2%
TypeScript 34%
Dockerfile 2.8%
HTML 0.4%
Shell 0.3%
Other 0.2%