15 KiB
15 KiB
litellm-vector-store
A vector store service built on top of LiteLLM and pgvector, providing an OpenAI-compatible API for semantic search, document storage and Retrieval Augmented Generation (RAG).
Features
- 🔐 Authentication via LiteLLM API Keys
- 🗄️ Vector Store powered by PostgreSQL + pgvector
- 🔍 Semantic Search with optional Reranking
- 🤖 RAG Endpoint - Search + LLM in one request
- 📄 File Upload - PDF, DOCX, TXT, Markdown, Excel, CSV, PowerPoint, HTML, E-Mail, JSON
- 🖼️ Image Support - Upload images via Vision LLM (JPG, PNG, GIF, WebP, TIFF)
- 🧩 OpenAI-compatible API - works with existing OpenAI SDKs
- 👥 Multi-User - Store permissions per user
- 🖥️ Admin UI - Manage users, stores and permissions
- 📊 Usage Tracking - Track requests per user
Architecture
Client (API Key)
│
▼
LiteLLM Proxy ──────────────────────────────┐
│ │
▼ ▼
Vector Store API LiteLLM Models
│ ┌──────────────────┐
▼ │ Embedding Models │
PostgreSQL + pgvector │ Vision Models │
│ LLM Models │
└──────────────────┘
Requirements
- Kubernetes Cluster
- PostgreSQL with pgvector extension (already deployed)
- LiteLLM Proxy (already deployed)
- Container Registry
Quick Start
1. Clone Repository
git clone https://github.com/your-org/litellm-vector-store.git
cd litellm-vector-store
2. Database Setup
kubectl exec -it <postgres-pod> -n <namespace> \
-- psql -U postgres -d vectordb << 'EOF'
CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
CREATE TABLE IF NOT EXISTS vector_stores (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
owner_user_id VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
store_id UUID REFERENCES vector_stores(id) ON DELETE CASCADE,
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
embedding vector(1024),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS store_permissions (
store_id UUID REFERENCES vector_stores(id) ON DELETE CASCADE,
user_id VARCHAR(255) NOT NULL,
permission VARCHAR(50) DEFAULT 'read',
PRIMARY KEY (store_id, user_id)
);
CREATE TABLE IF NOT EXISTS usage_stats (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id VARCHAR(255) NOT NULL,
store_id UUID REFERENCES vector_stores(id) ON DELETE SET NULL,
action VARCHAR(50) NOT NULL,
tokens INT DEFAULT 0,
duration FLOAT DEFAULT 0,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_documents_store
ON documents(store_id);
CREATE INDEX IF NOT EXISTS idx_documents_embedding
ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
CREATE INDEX IF NOT EXISTS idx_usage_user
ON usage_stats(user_id);
CREATE INDEX IF NOT EXISTS idx_usage_created
ON usage_stats(created_at);
GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO vecuser;
GRANT ALL PRIVILEGES ON ALL SEQUENCES IN SCHEMA public TO vecuser;
EOF
3. Configure
# Create secrets
kubectl create secret generic vector-api-secrets \
--namespace vector-store \
--from-literal=DATABASE_URL="postgresql://vecuser:pass@postgres:5432/vectordb" \
--from-literal=LITELLM_MASTER_KEY="sk-master-key"
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: vector-store-config
namespace: vector-store
data:
LITELLM_PROXY_URL: "http://litellm.<namespace>.svc.cluster.local:4000"
ADMIN_USER_IDS: "your-admin-user-id"
API_URL: "https://api.your-domain.com"
EMBEDDING_MODEL: "your-embedding-model"
VISION_MODEL: "openai/gpt-4o-mini"
4. Build & Deploy
# Build & push API
docker build -t your-registry/vector-store-api:1.0.0 .
docker push your-registry/vector-store-api:1.0.0
# Build & push Admin UI
docker build \
-t your-registry/vector-store-admin:1.0.0 \
./ui
docker push your-registry/vector-store-admin:1.0.0
# Deploy
kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/configmap.yaml
kubectl apply -f k8s/secrets.yaml
kubectl apply -f k8s/vector-api/
kubectl apply -f k8s/admin-ui/
kubectl apply -f k8s/ingress-api.yaml
kubectl apply -f k8s/ingress-ui.yaml
Project Structure
litellm-vector-store/
├── app/ # FastAPI Backend
│ ├── main.py # Application entry point
│ ├── auth.py # LiteLLM authentication
│ ├── database.py # PostgreSQL connection
│ ├── models.py # Pydantic models
│ ├── routers/
│ │ ├── stores.py # Vector store CRUD
│ │ ├── documents.py # Document management
│ │ ├── admin.py # Admin endpoints
│ │ └── openai_compat.py # OpenAI-compatible API
│ └── utils/
│ ├── chunking.py # Text chunking
│ ├── image_processor.py # Vision LLM integration
│ └── stats.py # Usage tracking
├── ui/ # React Admin UI
│ ├── src/
│ │ ├── pages/
│ │ │ ├── Login.tsx
│ │ │ ├── Dashboard.tsx
│ │ │ ├── Users.tsx
│ │ │ └── Stores.tsx
│ │ ├── components/
│ │ │ ├── Layout.tsx
│ │ │ └── PermissionModal.tsx
│ │ └── api/
│ │ └── client.ts
│ └── Dockerfile
├── k8s/ # Kubernetes manifests
│ ├── namespace.yaml
│ ├── configmap.yaml
│ ├── secrets.yaml
│ ├── vector-api/
│ │ ├── deployment.yaml
│ │ └── service.yaml
│ ├── admin-ui/
│ │ ├── deployment.yaml
│ │ └── service.yaml
│ ├── ingress-api.yaml
│ └── ingress-ui.yaml
├── scripts/
│ └── init.sql # Database initialization
├── Dockerfile
├── requirements.txt
└── README.md
API Reference
Base URL
https://api.your-domain.com/v1
Authentication
Authorization: Bearer sk-your-api-key
Endpoints
| Method | Endpoint | Description |
|---|---|---|
GET |
/v1/models |
List all models |
GET |
/v1/embeddings/models |
List embedding models |
GET |
/v1/vision/models |
List vision models |
POST |
/v1/embeddings |
Create embeddings |
POST |
/v1/vector_stores |
Create store |
GET |
/v1/vector_stores |
List stores |
GET |
/v1/vector_stores/{id} |
Get store |
DELETE |
/v1/vector_stores/{id} |
Delete store |
POST |
/v1/vector_stores/{id}/files |
Add texts |
GET |
/v1/vector_stores/{id}/files |
List files |
DELETE |
/v1/vector_stores/{id}/files/{file_id} |
Delete file |
POST |
/v1/vector_stores/{id}/upload |
Upload file or image |
POST |
/v1/vector_stores/{id}/search |
Semantic search |
POST |
/v1/vector_stores/{id}/rag |
RAG query |
Examples
Store anlegen & Datei hochladen
import httpx
client = httpx.Client(
base_url="https://api.your-domain.com/v1",
headers={"Authorization": "Bearer sk-your-key"},
timeout=120.0
)
# Create store
store = client.post(
"/vector_stores",
json={"name": "My Knowledge Base"}
).json()
# Upload document
with open("document.pdf", "rb") as f:
client.post(
f"/vector_stores/{store['id']}/upload",
files={"file": f}
)
# Upload image (with default vision model)
with open("screenshot.png", "rb") as f:
client.post(
f"/vector_stores/{store['id']}/upload",
files={"file": f}
)
# Upload image (with custom vision model)
with open("diagram.png", "rb") as f:
client.post(
f"/vector_stores/{store['id']}/upload",
files={"file": f},
data={
"vision_model": "openai/gpt-4o",
"vision_prompt": "Explain this diagram in detail."
}
)
# Search
results = client.post(
f"/vector_stores/{store['id']}/search",
json={
"query": "What is FastAPI?",
"top_k": 3,
"rerank": True
}
).json()
# RAG
answer = client.post(
f"/vector_stores/{store['id']}/rag",
json={
"query": "What is FastAPI?",
"model": "openai/gpt-4o-mini",
"rerank": True
}
).json()
print(answer["answer"])
JavaScript / TypeScript
const API_KEY = "sk-your-api-key";
const BASE_URL = "https://api.your-domain.com/v1";
const HEADERS = {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json"
};
// Create store
const store = await fetch(`${BASE_URL}/vector_stores`, {
method: "POST",
headers: HEADERS,
body: JSON.stringify({ name: "My Store" })
}).then(r => r.json());
// Search
const results = await fetch(
`${BASE_URL}/vector_stores/${store.id}/search`, {
method: "POST",
headers: HEADERS,
body: JSON.stringify({
query: "What is FastAPI?",
top_k: 3,
rerank: true
})
}).then(r => r.json());
// RAG
const answer = await fetch(
`${BASE_URL}/vector_stores/${store.id}/rag`, {
method: "POST",
headers: HEADERS,
body: JSON.stringify({
query: "What is FastAPI?"
})
}).then(r => r.json());
console.log(answer.answer);
curl
# Create store
curl -X POST https://api.your-domain.com/v1/vector_stores \
-H "Authorization: Bearer sk-your-key" \
-H "Content-Type: application/json" \
-d '{"name": "My Store"}'
# Upload document
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/upload \
-H "Authorization: Bearer sk-your-key" \
-F "file=@document.pdf"
# Upload image with custom vision model
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/upload \
-H "Authorization: Bearer sk-your-key" \
-F "file=@diagram.png" \
-F "vision_model=openai/gpt-4o" \
-F "vision_prompt=Explain this diagram in detail."
# Search
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/search \
-H "Authorization: Bearer sk-your-key" \
-H "Content-Type: application/json" \
-d '{"query": "What is FastAPI?", "top_k": 3, "rerank": true}'
# RAG
curl -X POST https://api.your-domain.com/v1/vector_stores/{store_id}/rag \
-H "Authorization: Bearer sk-your-key" \
-H "Content-Type: application/json" \
-d '{"query": "What is FastAPI?", "model": "openai/gpt-4o-mini"}'
Configuration Reference
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
DATABASE_URL |
✅ | — | PostgreSQL connection URL |
LITELLM_PROXY_URL |
✅ | — | LiteLLM proxy URL |
LITELLM_MASTER_KEY |
✅ | — | LiteLLM master key |
ADMIN_USER_IDS |
✅ | — | Comma-separated admin user IDs |
EMBEDDING_MODEL |
❌ | text-embedding-ada-002 |
Default embedding model |
VISION_MODEL |
❌ | openai/gpt-4o-mini |
Default vision model |
Upload Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
file |
file | — | File to upload |
chunk_size |
int | 512 | Characters per chunk |
chunk_overlap |
int | 50 | Overlap between chunks |
vision_model |
string | Config default | Vision model for images |
vision_prompt |
string | Auto | Custom prompt for vision model |
Search Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | — | Search query |
top_k |
int | 5 | Number of results (max. 50) |
rerank |
bool | false | Enable reranking |
rerank_model |
string | Auto | Custom rerank model |
RAG Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query |
string | — | Question |
model |
string | cosair/gemma4:31b | LLM model |
top_k |
int | 5 | Context documents |
rerank |
bool | false | Enable reranking |
system_prompt |
string | Auto | Custom system prompt |
messages |
array | [] | Chat history |
Supported File Formats
| Format | Extension | Notes |
|---|---|---|
| Text | .txt |
UTF-8 encoded |
| Markdown | .md |
Standard Markdown |
.pdf |
Text PDFs only, no scans | |
| Word | .docx |
Microsoft Word 2007+ |
| Excel | .xlsx |
All sheets extracted |
| CSV | .csv |
All columns extracted |
| PowerPoint | .pptx |
All slides extracted |
| HTML | .html .htm |
Scripts/styles removed |
| Outlook Mail | .msg |
Including headers |
.eml |
Including headers | |
| JSON | .json |
Pretty printed |
| Image | .jpg .jpeg .png .gif .webp .tiff |
Via Vision LLM |
Limits
| Limit | Value |
|---|---|
| Max file size | 256 MB |
| Max search results | 50 |
| Request timeout | 600 seconds |
| Default chunk size | 512 characters |
| Default chunk overlap | 50 characters |
Admin UI
The Admin UI is available at https://admin.your-domain.com.
Login with your Admin API Key to:
- 📊 View usage statistics
- 👥 Manage users and their stores
- 🔑 Rotate API keys
- 🔒 Grant/revoke store permissions
Development
# Install dependencies
pip install -r requirements.txt
# Run locally
DATABASE_URL="postgresql://..." \
LITELLM_PROXY_URL="http://..." \
LITELLM_MASTER_KEY="sk-..." \
ADMIN_USER_IDS="your-id" \
EMBEDDING_MODEL="your-model" \
VISION_MODEL="openai/gpt-4o-mini" \
uvicorn app.main:app --reload
# Run UI locally
cd ui
npm install
VITE_API_URL=http://localhost:8000 npm run dev
Tech Stack
| Component | Technology |
|---|---|
| API | FastAPI + Python 3.12 |
| Database | PostgreSQL 16 + pgvector |
| Auth | LiteLLM Key Management |
| Embeddings | Via LiteLLM Proxy |
| Vision | Via LiteLLM Vision Models |
| Admin UI | React + TypeScript + Tailwind CSS |
| Container | Docker + Kubernetes |
| Ingress | NGINX Ingress Controller |
| TLS | cert-manager + Let's Encrypt |
License
MIT License - see LICENSE for details.
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -m 'Add my feature') - Push to the branch (
git push origin feature/my-feature) - Open a Pull Request