The problem
Building a production-grade RAG chatbot requires integrating many moving parts:- An LLM provider (and handling retries, key rotation, rate limits)
- A vector database for semantic search
- An embedding pipeline to index your documents
- A reranker to improve search precision
- A conversation memory system
- A database to persist chat history
- A REST API to expose it all
- Session management per user and per application
What LangChat gives you
chat() call:
- Rephrases the question as a standalone query (handles “it”, “that”, follow-ups)
- Embeds the question and searches Pinecone
- Reranks the top results with Flashrank
- Builds a prompt combining context + conversation history
- Calls the LLM
- Saves the exchange to Supabase (non-blocking, in the background)
- Returns a typed
ChatResponsewith.text,.status,.response_time
Compared to alternatives
| Feature | LangChat | Raw LangChain | LlamaIndex | Building from scratch |
|---|---|---|---|---|
| RAG pipeline | Built-in | Manual | Manual | Manual |
| Session management | Built-in | Manual | Manual | Manual |
| Multiple LLM providers | 6 built-in | Via community | Via community | Manual |
| API server | One function call | Manual | Manual | Manual |
| Chat history storage | Built-in | Manual | Manual | Manual |
| Reranking | Built-in | Manual | Manual | Manual |
| Document indexing | lc.index(path) | Manual | Manual | Manual |
| Typed responses | ChatResponse | dict | dict | Manual |
| Time to first chatbot | 5 minutes | Days | Days | Weeks |
Design principles
Environment variables first. Every provider reads credentials from the environment. No keys in code, no keys in config files. Sensible defaults. Flashrank reranker,text-embedding-3-large, 20-message history window — everything works without any configuration.
Hexagonal architecture. Core logic is isolated from adapters. Swap out any provider without touching business logic.
Async-first, sync available. All chat methods are async for high-throughput APIs, with sync wrappers for scripts and notebooks.
Typed responses. ChatResponse is a dataclass — use response.text, if response:, print(response). No more result["response"] dict access.
Who uses LangChat
- SaaS companies — add AI chat to their product without a dedicated ML team
- Enterprise teams — chatbots over internal documents, wikis, and knowledge bases
- Agencies — spin up white-label chatbots for clients in hours
- Solo developers — ship RAG products without deep ML knowledge
