Documentation Index
Fetch the complete documentation index at: https://langchat.neurobrains.co/llms.txt
Use this file to discover all available pages before exploring further.
What is RAG?
Retrieval-Augmented Generation (RAG) combines a vector search step with an LLM generation step:- Retrieve — find document chunks most relevant to the user’s question
- Augment — inject those chunks into the LLM prompt as context
- Generate — the LLM answers based on the retrieved context, not just its training data
How LangChat’s pipeline works
1. Standalone question reformulation
Before searching, LangChat uses the LLM to rewrite the user’s message as a standalone query. This resolves pronouns and references to earlier messages:2. Embedding
The question is embedded using OpenAI’stext-embedding-3-large model (3072 dimensions). The same model must be used when indexing documents — mixing models produces incorrect results.
3. Pinecone similarity search
LangChat queries Pinecone for the top-k most similar chunks (k=5 by default via the retriever). The similarity metric is cosine distance.
4. Flashrank reranking
The top-k Pinecone results are reranked by Flashrank, a fast cross-encoder model that more accurately scores relevance than cosine similarity alone. The default model isms-marco-MiniLM-L-12-v2, keeping the top 3 results.
Reranking improves answer quality significantly — especially for long documents where many chunks may be superficially similar but only a few are truly relevant.
Pinecone namespaces
Use namespaces to partition documents within a single index. Searches are scoped to the namespace you configure:- Separating different clients in a multi-tenant app
- Partitioning by language or region
- Separating document types (e.g., products vs. policies)
Changing the retriever depth
The default retriever fetchesk=5 chunks before reranking. To fetch more candidates before reranking (improves recall at the cost of latency):
This is controlled by the PineconeVectorAdapter internals. For advanced customization, see Extending Adapters.
Embedding model choice
| Model | Dimensions | Quality | Cost |
|---|---|---|---|
text-embedding-3-large | 3072 | Highest | ~2× more than small |
text-embedding-3-small | 1536 | Good | Lower |
text-embedding-ada-002 | 1536 | Older baseline | Similar to small |
Pinecone provider:
When there’s no relevant context
If Pinecone returns no relevant results (low similarity scores), the LLM still receives the prompt — but the{context} placeholder will be empty or contain low-quality chunks. This can lead to hallucinated answers.
Best practices:
- Always tell the model what to do when context is missing: “If the answer is not in the context, say you don’t know.”
- Ensure documents are indexed before going live
- Monitor queries that return empty context (visible in Supabase
request_metrics)
