Skip to main content

The problem

Building a production-grade RAG chatbot requires integrating many moving parts:
  • An LLM provider (and handling retries, key rotation, rate limits)
  • A vector database for semantic search
  • An embedding pipeline to index your documents
  • A reranker to improve search precision
  • A conversation memory system
  • A database to persist chat history
  • A REST API to expose it all
  • Session management per user and per application
Done from scratch, this takes weeks. LangChat does it in minutes.

What LangChat gives you

from langchat import LangChat
from langchat.providers import OpenAI, Pinecone, Supabase

lc = LangChat(
    llm=OpenAI("gpt-4o-mini"),
    vector_db=Pinecone("my-index"),
    db=Supabase(),
)

response = await lc.chat(query="What's our refund policy?", user_id="alice")
print(response)
That single chat() call:
  1. Rephrases the question as a standalone query (handles “it”, “that”, follow-ups)
  2. Embeds the question and searches Pinecone
  3. Reranks the top results with Flashrank
  4. Builds a prompt combining context + conversation history
  5. Calls the LLM
  6. Saves the exchange to Supabase (non-blocking, in the background)
  7. Returns a typed ChatResponse with .text, .status, .response_time

Compared to alternatives

FeatureLangChatRaw LangChainLlamaIndexBuilding from scratch
RAG pipelineBuilt-inManualManualManual
Session managementBuilt-inManualManualManual
Multiple LLM providers6 built-inVia communityVia communityManual
API serverOne function callManualManualManual
Chat history storageBuilt-inManualManualManual
RerankingBuilt-inManualManualManual
Document indexinglc.index(path)ManualManualManual
Typed responsesChatResponsedictdictManual
Time to first chatbot5 minutesDaysDaysWeeks

Design principles

Environment variables first. Every provider reads credentials from the environment. No keys in code, no keys in config files. Sensible defaults. Flashrank reranker, text-embedding-3-large, 20-message history window — everything works without any configuration. Hexagonal architecture. Core logic is isolated from adapters. Swap out any provider without touching business logic. Async-first, sync available. All chat methods are async for high-throughput APIs, with sync wrappers for scripts and notebooks. Typed responses. ChatResponse is a dataclass — use response.text, if response:, print(response). No more result["response"] dict access.

Who uses LangChat

  • SaaS companies — add AI chat to their product without a dedicated ML team
  • Enterprise teams — chatbots over internal documents, wikis, and knowledge bases
  • Agencies — spin up white-label chatbots for clients in hours
  • Solo developers — ship RAG products without deep ML knowledge