A vector store service built on top of LiteLLM and pgvector, providing an OpenAI-compatible API for semantic search, document storage and Retrieval Augmented Generation (RAG).

Features

🔐 Authentication via LiteLLM API Keys
🗄️ Vector Store powered by PostgreSQL + pgvector
🔍 Semantic Search with optional Reranking
🤖 RAG Endpoint - Search + LLM in one request
📄 File Upload - PDF, DOCX, TXT, Markdown, Excel, CSV, PowerPoint, HTML, E-Mail, JSON
🖼️ Image Support - Upload images via Vision LLM (JPG, PNG, GIF, WebP, TIFF)
🧩 OpenAI-compatible API - works with existing OpenAI SDKs
👥 Multi-User - Store permissions per user
🖥️ Admin UI - Manage users, stores and permissions
📊 Usage Tracking - Track requests per user

Architecture

Client (API Key)
      │
      ▼
LiteLLM Proxy ──────────────────────────────┐
      │                                     │
      ▼                                     ▼
Vector Store API                    LiteLLM Models
      │                          ┌──────────────────┐
      ▼                          │ Embedding Models  │
PostgreSQL + pgvector             │ Vision Models     │
                                  │ LLM Models        │
                                  └──────────────────┘

Requirements

Kubernetes Cluster
PostgreSQL with pgvector extension (already deployed)
LiteLLM Proxy (already deployed)
Container Registry

Quick Start

1. Clone Repository

git clone https://github.com/your-org/litellm-vector-store.git
cd litellm-vector-store

2. Database Setup

kubectl exec -it <postgres-pod> -n <namespace> \
  -- psql -U postgres -d vectordb << 'EOF'

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE IF NOT EXISTS vector_stores (
    id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name          VARCHAR(255) NOT NULL,
    owner_user_id VARCHAR(255) NOT NULL,
    created_at    TIMESTAMP DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS documents (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    store_id   UUID REFERENCES vector_stores(id) ON DELETE CASCADE,
    content    TEXT NOT NULL,
    metadata   JSONB DEFAULT '{}',
    embedding  vector(1024),
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS store_permissions (
    store_id   UUID REFERENCES vector_stores(id) ON DELETE CASCADE,
    user_id    VARCHAR(255) NOT NULL,
    permission VARCHAR(50) DEFAULT 'read',
    PRIMARY KEY (store_id, user_id)
);

CREATE TABLE IF NOT EXISTS usage_stats (
    id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id    VARCHAR(255) NOT NULL,
    store_id   UUID REFERENCES vector_stores(id) ON DELETE SET NULL,
    action     VARCHAR(50) NOT NULL,
    tokens     INT DEFAULT 0,
    duration   FLOAT DEFAULT 0,
    created_at TIMESTAMP DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_documents_store
    ON documents(store_id);
CREATE INDEX IF NOT EXISTS idx_documents_embedding
    ON documents USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);
CREATE INDEX IF NOT EXISTS idx_usage_user
    ON usage_stats(user_id);
CREATE INDEX IF NOT EXISTS idx_usage_created
    ON usage_stats(created_at);

GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO vecuser;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO vecuser;

EOF

3. Configure

# Create secrets
kubectl create secret generic vector-api-secrets \
  --namespace vector-store \
  --from-literal=DATABASE_URL="postgresql://vecuser:pass@postgres:5432/vectordb" \
  --from-literal=LITELLM_MASTER_KEY="sk-master-key"

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: vector-store-config
  namespace: vector-store
data:
  LITELLM_PROXY_URL: "http://litellm.<namespace>.svc.cluster.local:4000"
  ADMIN_USER_IDS:    "your-admin-user-id"
  API_URL:           "https://api.your-domain.com"
  EMBEDDING_MODEL:   "your-embedding-model"
  VISION_MODEL:      "openai/gpt-4o-mini"

4. Build & Deploy

# Build & push API
docker build -t your-registry/vector-store-api:1.0.0 .
docker push your-registry/vector-store-api:1.0.0

# Build & push Admin UI
docker build \
  -t your-registry/vector-store-admin:1.0.0 \
  ./ui
docker push your-registry/vector-store-admin:1.0.0

# Deploy
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secrets.yaml
kubectl apply -f k8s/vector-api/
kubectl apply -f k8s/admin-ui/
kubectl apply -f k8s/ingress-api.yaml
kubectl apply -f k8s/ingress-ui.yaml

Project Structure

litellm-vector-store/
├── app/                          # FastAPI Backend
│   ├── main.py                   # Application entry point
│   ├── auth.py                   # LiteLLM authentication
│   ├── database.py               # PostgreSQL connection
│   ├── models.py                 # Pydantic models
│   ├── routers/
│   │   ├── stores.py             # Vector store CRUD
│   │   ├── documents.py          # Document management
│   │   ├── admin.py              # Admin endpoints
│   │   └── openai_compat.py      # OpenAI-compatible API
│   └── utils/
│       ├── chunking.py           # Text chunking
│       ├── image_processor.py    # Vision LLM integration
│       └── stats.py              # Usage tracking
├── ui/                           # React Admin UI
│   ├── src/
│   │   ├── pages/
│   │   │   ├── Login.tsx
│   │   │   ├── Dashboard.tsx
│   │   │   ├── Users.tsx
│   │   │   └── Stores.tsx
│   │   ├── components/
│   │   │   ├── Layout.tsx
│   │   │   └── PermissionModal.tsx
│   │   └── api/
│   │       └── client.ts
│   └── Dockerfile
├── k8s/                          # Kubernetes manifests
│   ├── namespace.yaml
│   ├── configmap.yaml
│   ├── secrets.yaml
│   ├── vector-api/
│   │   ├── deployment.yaml
│   │   └── service.yaml
│   ├── admin-ui/
│   │   ├── deployment.yaml
│   │   └── service.yaml
│   ├── ingress-api.yaml
│   └── ingress-ui.yaml
├── scripts/
│   └── init.sql                  # Database initialization
├── Dockerfile
├── requirements.txt
└── README.md

API Reference

Base URL

https://api.your-domain.com/v1

Authentication

Authorization: Bearer sk-your-api-key

Endpoints

Method	Endpoint	Description
`GET`	`/v1/models`	List all models
`GET`	`/v1/embeddings/models`	List embedding models
`GET`	`/v1/vision/models`	List vision models
`POST`	`/v1/embeddings`	Create embeddings
`POST`	`/v1/vector_stores`	Create store
`GET`	`/v1/vector_stores`	List stores
`GET`	`/v1/vector_stores/{id}`	Get store
`DELETE`	`/v1/vector_stores/{id}`	Delete store
`POST`	`/v1/vector_stores/{id}/files`	Add texts
`GET`	`/v1/vector_stores/{id}/files`	List files
`DELETE`	`/v1/vector_stores/{id}/files/{file_id}`	Delete file
`POST`	`/v1/vector_stores/{id}/upload`	Upload file or image
`POST`	`/v1/vector_stores/{id}/search`	Semantic search
`POST`	`/v1/vector_stores/{id}/rag`	RAG query

Examples

Store anlegen & Datei hochladen

import httpx

client = httpx.Client(
    base_url="https://api.your-domain.com/v1",
    headers={"Authorization": "Bearer sk-your-key"},
    timeout=120.0
)

# Create store
store = client.post(
    "/vector_stores",
    json={"name": "My Knowledge Base"}
).json()

# Upload document
with open("document.pdf", "rb") as f:
    client.post(
        f"/vector_stores/{store['id']}/upload",
        files={"file": f}
    )

# Upload image (with default vision model)
with open("screenshot.png", "rb") as f:
    client.post(
        f"/vector_stores/{store['id']}/upload",
        files={"file": f}
    )

# Upload image (with custom vision model)
with open("diagram.png", "rb") as f:
    client.post(
        f"/vector_stores/{store['id']}/upload",
        files={"file": f},
        data={
            "vision_model":  "openai/gpt-4o",
            "vision_prompt": "Explain this diagram in detail."
        }
    )

# Search
results = client.post(
    f"/vector_stores/{store['id']}/search",
    json={
        "query":  "What is FastAPI?",
        "top_k":  3,
        "rerank": True
    }
).json()

# RAG
answer = client.post(
    f"/vector_stores/{store['id']}/rag",
    json={
        "query":  "What is FastAPI?",
        "model":  "openai/gpt-4o-mini",
        "rerank": True
    }
).json()
print(answer["answer"])

JavaScript / TypeScript

const API_KEY  = "sk-your-api-key";
const BASE_URL = "https://api.your-domain.com/v1";
const HEADERS  = {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type":  "application/json"
};

// Create store
const store = await fetch(`${BASE_URL}/vector_stores`, {
    method:  "POST",
    headers: HEADERS,
    body:    JSON.stringify({ name: "My Store" })
}).then(r => r.json());

// Search
const results = await fetch(
    `${BASE_URL}/vector_stores/${store.id}/search`, {
    method:  "POST",
    headers: HEADERS,
    body:    JSON.stringify({
        query:  "What is FastAPI?",
        top_k:  3,
        rerank: true
    })
}).then(r => r.json());

// RAG
const answer = await fetch(
    `${BASE_URL}/vector_stores/${store.id}/rag`, {
    method:  "POST",
    headers: HEADERS,
    body:    JSON.stringify({
        query: "What is FastAPI?"
    })
}).then(r => r.json());

console.log(answer.answer);

curl

# Create store
curl -X POST https://api.your-domain.com/v1/vector_stores \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{"name": "My Store"}'

# Upload document
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/upload \
  -H "Authorization: Bearer sk-your-key" \
  -F "file=@document.pdf"

# Upload image with custom vision model
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/upload \
  -H "Authorization: Bearer sk-your-key" \
  -F "file=@diagram.png" \
  -F "vision_model=openai/gpt-4o" \
  -F "vision_prompt=Explain this diagram in detail."

# Search
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/search \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is FastAPI?", "top_k": 3, "rerank": true}'

# RAG
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/rag \
  -H "Authorization: Bearer sk-your-key" \
  -H "Content-Type: application/json" \
  -d '{"query": "What is FastAPI?", "model": "openai/gpt-4o-mini"}'

Configuration Reference

Environment Variables

Variable	Required	Default	Description
`DATABASE_URL`	✅	—	PostgreSQL connection URL
`LITELLM_PROXY_URL`	✅	—	LiteLLM proxy URL
`LITELLM_MASTER_KEY`	✅	—	LiteLLM master key
`ADMIN_USER_IDS`	✅	—	Comma-separated admin user IDs
`EMBEDDING_MODEL`	❌	`text-embedding-ada-002`	Default embedding model
`VISION_MODEL`	❌	`openai/gpt-4o-mini`	Default vision model

Upload Parameters

Parameter	Type	Default	Description
`file`	file	—	File to upload
`chunk_size`	int	512	Characters per chunk
`chunk_overlap`	int	50	Overlap between chunks
`vision_model`	string	Config default	Vision model for images
`vision_prompt`	string	Auto	Custom prompt for vision model

Search Parameters

Parameter	Type	Default	Description
`query`	string	—	Search query
`top_k`	int	5	Number of results (max. 50)
`rerank`	bool	false	Enable reranking
`rerank_model`	string	Auto	Custom rerank model

RAG Parameters

Parameter	Type	Default	Description
`query`	string	—	Question
`model`	string	cosair/gemma4:31b	LLM model
`top_k`	int	5	Context documents
`rerank`	bool	false	Enable reranking
`system_prompt`	string	Auto	Custom system prompt
`messages`	array	[]	Chat history

Supported File Formats

Format	Extension	Notes
Text	`.txt`	UTF-8 encoded
Markdown	`.md`	Standard Markdown
PDF	`.pdf`	Text PDFs only, no scans
Word	`.docx`	Microsoft Word 2007+
Excel	`.xlsx`	All sheets extracted
CSV	`.csv`	All columns extracted
PowerPoint	`.pptx`	All slides extracted
HTML	`.html` `.htm`	Scripts/styles removed
Outlook Mail	`.msg`	Including headers
E-Mail	`.eml`	Including headers
JSON	`.json`	Pretty printed
Image	`.jpg` `.jpeg` `.png` `.gif` `.webp` `.tiff`	Via Vision LLM

Limits

Limit	Value
Max file size	256 MB
Max search results	50
Request timeout	600 seconds
Default chunk size	512 characters
Default chunk overlap	50 characters

Admin UI

The Admin UI is available at https://admin.your-domain.com.

📊 View usage statistics
👥 Manage users and their stores
🔑 Rotate API keys
🔒 Grant/revoke store permissions

Development

# Install dependencies
pip install -r requirements.txt

# Run locally
DATABASE_URL="postgresql://..." \
LITELLM_PROXY_URL="http://..." \
LITELLM_MASTER_KEY="sk-..." \
ADMIN_USER_IDS="your-id" \
EMBEDDING_MODEL="your-model" \
VISION_MODEL="openai/gpt-4o-mini" \
uvicorn app.main:app --reload

# Run UI locally
cd ui
npm install
VITE_API_URL=http://localhost:8000 npm run dev

Tech Stack

Component	Technology
API	FastAPI + Python 3.12
Database	PostgreSQL 16 + pgvector
Auth	LiteLLM Key Management
Embeddings	Via LiteLLM Proxy
Vision	Via LiteLLM Vision Models
Admin UI	React + TypeScript + Tailwind CSS
Container	Docker + Kubernetes
Ingress	NGINX Ingress Controller
TLS	cert-manager + Let's Encrypt

License

MIT License - see LICENSE for details.

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/my-feature)
Commit your changes (git commit -m 'Add my feature')
Push to the branch (git push origin feature/my-feature)
Open a Pull Request

Languages

Python 62.2%

TypeScript 34%

Dockerfile 2.8%

HTML 0.4%

Shell 0.3%

Other 0.2%