LiteLLM Admin Guide
This document provides a comprehensive guide for administrators to deploy, configure, and manage the LiteLLM service.
1. Deployment via ArgoCD
- ArgoCD Application Name:
litellm(sync-wave 10) - Namespace:
litellm(auto-created withCreateNamespace=true) - Source Repo:
https://github.com/BerriAI/litellm.git - Chart Path:
deploy/charts/litellm-helm - Version Pin:
targetRevision: v1.76.1-stable(application), image tagmain-stablefor runtime container - Reconciliation: Automated with prune + selfHeal
2. Runtime Configuration
- Replicas: 3 (
replicaCount: 3) - Service Type: ClusterIP (
port: 4000) - Primary Dependencies:
- PostgreSQL (pooler RW service) for persistence / metadata:
postgresql-cloudnative-pg-cluster-pooler-rw.postgresql.svc.cluster.local - Redis for routing state & caching:
redis.redis.svc.cluster.local:6379 - Secrets Referenced:
litellm-provider-keys(LLM provider API keys likeOPENAI_API_KEY,ANTHROPIC_API_KEY)postgres(suppliesDATABASE_PASSWORD)litellm-master-key(master key secret; key name:LITELLM_MASTER_KEY)- Caching: Redis enabled with TTL 86,400 seconds (1 day) & namespace
litellm_cache - Retries:
num_retries: 2for routing - Telemetry: Disabled (
telemetry: false) - UI: Enabled (
ui.enabled: true) on same service port
3. Resource Management
- Requests: 250m CPU, 512Mi memory
- Limits: 1 CPU, 2Gi memory
4. Database Integration
db.url template (with secret expansion):
postgresql://litellm:$(DATABASE_PASSWORD)@postgresql-cloudnative-pg-cluster-pooler-rw.postgresql.svc.cluster.local:5432/app?schema=litellm
- User:
litellm - Database:
app - Schema:
litellm - Credentials: Injected via the
postgressecret (DATABASE_PASSWORDkey expected)
5. Redis Integration
Used both for:
- Rate limiting / router state (
router_settings) - Response / embedding cache (
litellm_settings.cache_params) Configuration (no password currently set — recommend adding secret-backed auth):
Host: redis.redis.svc.cluster.local
Port: 6379
Cache TTL: 86400 seconds
Namespace: litellm_cache
Flush Size: 100
Architecture Overview
graph TD
Client --> Service["Service: litellm.api.prod.everycure.org"]
Service --> Proxy[Model Proxy Logic]
Service --> Postgres["PostgreSQL (persistence: usage logs, metadata)"]
Service --> Redis["Redis (cache + routing state)"]
Service --> APIs["External LLM APIs (OpenAI, Anthropic, etc.)"]
High-level components:
- API / Proxy Layer: Exposes OpenAI-compatible routes & management UI
- Credential Layer: Provider keys pulled from Kubernetes secret(s)
- Caching Layer: Redis for latency + quota coordination across replicas
- Persistence Layer: PostgreSQL for structured storage
- Orchestration: ArgoCD for drift correction & version pinning
How to Access LiteLLM
In-Cluster (Service DNS)
Host: litellm.litellm.svc.cluster.local
Port: 4000
Protocol: HTTP (considering TLS ingress in future)
Example curl (OpenAI-style chat completion):
curl -s \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_MASTER_KEY" \
http://litellm.litellm.svc.cluster.local:4000/v1/chat/completions \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Say hello in 5 words"}
]
}' | jq .
Test locally:
curl -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
http://litellm.api.prod.everycure.org/v1/models | jq .
UI Access
Open in browser:
http://litellm.api.prod.everycure.org/
(Assess access control—if none, consider restricting with network policy or auth proxy.)
Environment Variables & Secrets
Typical contents (verify actual secret keys):
| Secret | Purpose | Expected Keys |
|---|---|---|
litellm-provider-keys |
External LLM providers | OPENAI_API_KEY, ANTHROPIC_API_KEY, others as needed |
postgres |
Database credentials | DATABASE_PASSWORD |
litellm-master-key |
Master API key for auth | LITELLM_MASTER_KEY |
redis-auth (future) |
Redis password (not yet) | password |
Retrieve master key:
kubectl get secret litellm-master-key -n litellm -o jsonpath='{.data.LITELLM_MASTER_KEY}' | base64 -d
Set locally:
export LITELLM_MASTER_KEY="$(kubectl get secret litellm-master-key -n litellm -o jsonpath='{.data.LITELLM_MASTER_KEY}' | base64 -d)"
Database password:
export DATABASE_PASSWORD="$(kubectl get secret postgres -n litellm -o jsonpath='{.data.DATABASE_PASSWORD}' | base64 -d)"
(Adjust namespace if postgres secret lives in postgresql namespace; if so, sync or project a copy into litellm.)
Scaling Considerations
- Horizontal Scaling: Increase
replicaCount; Redis + Postgres must handle added concurrency. - Bottlenecks: External provider API rate limits; enable per-provider throttling.
- Cache Strategy: TTL=1 day—validate memory pressure in Redis; adjust
ttlor add LRU if needed. - Retries:
num_retries: 2—tune based on provider error patterns.
Recommendations
- Introduce HPA on CPU + custom metrics (req/sec) once baseline traffic known.
- Add circuit breaking for provider timeouts.
- Consider distinct Redis logical DB or namespace per environment.
Observability
Suggested instrumentation (some not yet implemented):
- Logs:
kubectl logs -n litellm -l app.kubernetes.io/name=litellm - Metrics: Add sidecar or embed Prometheus exporter (LiteLLM telemetry disabled currently)
- Tracing: Wrap API gateway with OpenTelemetry collector (future enhancement)
Health checks:
# Basic liveness simulation
curl -H "Authorization: Bearer $LITELLM_MASTER_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"ping"}]}' \
http://litellm.litellm.svc.cluster.local:4000/v1/chat/completions | jq '.id'
Security & Hardening
| Area | Current | Improvement |
|---|---|---|
| Master Key | Stored in secret | Rotate periodically; audit access |
| Provider Keys | In single secret | Split per provider + RBAC restrict |
| Redis | No auth in use | Add password + TLS, limit network access |
| Transport | Plain HTTP inside cluster | Add mTLS via service mesh / ingress TLS |
| Telemetry | Disabled | Re-enable with scrubbed PII if insights needed |
| UI | Enabled, no note of auth | Restrict via auth proxy or disable in prod |
Troubleshooting
1. 401 Unauthorized
- Missing / invalid
Authorization: Bearerheader. - Master key mismatch—re-fetch secret.
2. 500 Errors from Provider
- Check provider quota: environment secret values valid?
- Inspect pod logs for upstream error JSON.
3. High Latency
- Check Redis connectivity:
redis-cli -h redis.redis.svc.cluster.local PING. - Validate Postgres pool usage: look for saturation or connection errors.
4. Cache Not Working
- Ensure
cache: truein bothmodel_infoandlitellm_settings. - Confirm Redis key writes:
redis-cli KEYS 'litellm_cache*' | head.
5. Schema Issues in Postgres
- Ensure schema
litellmexists or migrations applied (if LiteLLM uses migrations). Create manually if needed.
6. Rate Limit Mismatch
- Adjust Redis-based coordination using router settings or add explicit per-model quotas.
7. Pod CrashLoopBackOff
- Inspect logs for missing env secret references.
- Ensure secrets are in same namespace (
litellm).
Future Improvements
- Add Redis AUTH + TLS and rotate credentials.
- Implement Prometheus metrics & dashboards (requests, latency, token usage, cache hit rate).
- Add OpenTelemetry tracing for end-to-end call chains.
- HPA + Vertical Pod Autoscaler (evaluation) for dynamic scaling.
- Canary strategy for upgrading LiteLLM versions.
- Per-provider concurrency + rate limit configs.
- Optionally enable streaming endpoint demos in UI with auth guard.
Quick Reference
| Item | Value |
|---|---|
| Namespace | litellm |
| Service DNS | litellm.litellm.svc.cluster.local |
| Port | 4000 |
| Replicas | 3 |
| DB URL | postgresql://litellm:$(DATABASE_PASSWORD)@postgresql-cloudnative-pg-cluster-pooler-rw.postgresql.svc.cluster.local:5432/app?schema=litellm |
| Redis | redis.redis.svc.cluster.local:6379 |
| Cache TTL | 86400s |
| Master Key Secret | litellm-master-key / key LITELLM_MASTER_KEY |
| Provider Keys Secret | litellm-provider-keys |
| DB Password Secret | postgres (env key: DATABASE_PASSWORD) |
| Public URL Link | https://litellm.api.prod.everycure.org |
Review and adjust if Helm values change or additional providers are added.
Port Forward for Local Testing
kubectl port-forward -n litellm svc/litellm 4000:4000
# Now available at localhost:4000