Skip to main content

Checklist

Before going live, verify:
  • All API keys are in environment variables (not hardcoded)
  • .env is in .gitignore
  • Documents are indexed in Pinecone
  • Supabase tables exist (created automatically on first run)
  • Server uses SUPABASE_SERVICE_ROLE_KEY (bypasses RLS)
  • Multiple workers configured (uvicorn --workers)
  • Health check endpoint responding at /health

Environment variables

In production, set environment variables via your hosting platform (not a .env file):
# Heroku
heroku config:set OPENAI_API_KEY=sk-...

# Railway
railway variables set OPENAI_API_KEY=sk-...

# Docker / Kubernetes — pass via -e flags or secrets

Uvicorn configuration

# Single process (development)
uvicorn server:app --host 0.0.0.0 --port 8000 --reload

# Multi-process (production) — use number of CPU cores
uvicorn server:app --host 0.0.0.0 --port 8000 --workers 4

# With Gunicorn (more robust process management)
pip install gunicorn
gunicorn server:app \
  --worker-class uvicorn.workers.UvicornWorker \
  --workers 4 \
  --bind 0.0.0.0:8000 \
  --timeout 120 \
  --keep-alive 5

Docker deployment

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY pyproject.toml .
RUN pip install langchat uvicorn gunicorn

# Copy application
COPY server.py .

# Build UI (if using the built-in frontend)
COPY src/langchat/core/ui /app/ui
RUN cd /app/ui && npm ci && npm run build

EXPOSE 8000

CMD ["gunicorn", "server:app", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--workers", "4", \
     "--bind", "0.0.0.0:8000"]
docker build -t my-chatbot .

docker run -d \
  -p 8000:8000 \
  -e OPENAI_API_KEY=sk-... \
  -e PINECONE_API_KEY=pcsk-... \
  -e SUPABASE_URL=https://xxxx.supabase.co \
  -e SUPABASE_SERVICE_ROLE_KEY=eyJ... \
  --name my-chatbot \
  my-chatbot

Health checks

Use the /health endpoint for load balancer and uptime monitor health checks:
curl http://your-server/health
# {"status":"healthy","timestamp":"...","version":"1.0.1"}
In Docker Compose:
services:
  chatbot:
    image: my-chatbot
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3

API key rotation

For high-traffic deployments, use multiple OpenAI keys to stay within rate limits:
from langchat.providers import OpenAI

llm = OpenAI(
    "gpt-4o-mini",
    api_keys=[
        os.environ["OPENAI_KEY_1"],
        os.environ["OPENAI_KEY_2"],
        os.environ["OPENAI_KEY_3"],
    ],
    max_retries_per_key=2,
)

Monitoring

LangChat saves request metrics to Supabase automatically. Query them to monitor:
-- Average response time (last 24 hours)
SELECT AVG(response_time) as avg_latency
FROM request_metrics
WHERE request_time > NOW() - INTERVAL '24 hours'
  AND success = true;

-- Error rate
SELECT
  COUNT(*) FILTER (WHERE success = false) as errors,
  COUNT(*) as total,
  ROUND(COUNT(*) FILTER (WHERE success = false)::numeric / COUNT(*) * 100, 2) as error_rate
FROM request_metrics
WHERE request_time > NOW() - INTERVAL '1 hour';

-- Slowest queries
SELECT user_id, response_time, request_time
FROM request_metrics
ORDER BY response_time DESC
LIMIT 20;

Supabase connection pooling

For high concurrency, use Supabase’s connection pooler (Transaction mode via port 6543):
# Use the pooler URL instead of direct connection
SUPABASE_URL=https://xxxx.supabase.co  # standard — works for most cases
Supabase handles connection pooling internally for the REST API (which LangChat uses via the Python client). No extra configuration needed.

Security

Never expose raw API keys. Always use environment variables or a secrets manager (AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault). CORS is enabled for all origins by default (allow_origins=["*"]). To restrict in production:
# Access the underlying FastAPI app after creation
from langchat.api import get_app
from fastapi.middleware.cors import CORSMiddleware

app = create_app(...)

# Remove the permissive CORS middleware and add a restrictive one
app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://your-frontend.com"],
    allow_methods=["GET", "POST"],
    allow_headers=["*"],
)
Rate limiting — add rate limiting middleware to prevent abuse:
pip install slowapi
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from slowapi.errors import RateLimitExceeded

limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)