What is Vector Search?
Vector search enables your chatbot to find relevant documents from your knowledge base and use them to answer questions accurately. How it works:- Query: User asks a question
- Embedding: Question converted to vector embedding
- Search: Similar vectors found in Pinecone
- Retrieval: Relevant documents retrieved
- Reranking: Results reranked for relevance
- Context: Documents used as context for LLM response
Vector Search Flow
Configuration
Configure vector search inLangChatConfig:
Retrieval Configuration
Retrieval K (Number of Documents)
Control how many documents to retrieve:- Small index (< 1000 docs):
retrieval_k=5 - Medium index (1000-10000 docs):
retrieval_k=10 - Large index (> 10000 docs):
retrieval_k=20
More documents = better coverage, but slower and potentially less relevant.
Reranking Top N
Control how many documents to keep after reranking:- Precise queries:
reranker_top_n=3 - Broad queries:
reranker_top_n=5 - Complex topics:
reranker_top_n=7
Reranking improves relevance by re-scoring retrieved documents. More documents = more context, but can confuse the LLM.
Embedding Models
LangChat supports OpenAI embedding models:text-embedding-3-large (Recommended)
- Highest quality embeddings
- Best for semantic search
- Good for complex queries
text-embedding-3-small
- Faster and cheaper
- Good for simple queries
- Lower latency
text-embedding-ada-002 (Legacy)
Reranking Models
Flashrank reranker improves search result relevance:ms-marco-MiniLM-L-12-v2 (Default)
- ~50MB download size
- Fast and efficient
- Good accuracy
Reranker models are automatically downloaded on first use to
rerank_models/ directory.How Vector Search Works
Step 1: Query Processing
User question is processed:Step 2: Embedding Generation
Question is converted to vector:Step 3: Vector Search
Similar vectors found in Pinecone:Step 4: Reranking
Results reranked for relevance:Step 5: Context Injection
Documents used as context:Best Practices
1. Optimize Retrieval K
Balance between coverage and relevance:2. Use Reranking
Always use reranking for better results:3. Choose Right Embedding Model
Match model to your needs:4. Index Your Documents
Make sure your Pinecone index contains relevant documents:Troubleshooting
Issue: No relevant results
Solutions:- Increase
retrieval_k:retrieval_k=20 - Check your Pinecone index has documents
- Verify embeddings are generated correctly
- Try different embedding model
Issue: Too many irrelevant results
Solutions:- Use reranking:
reranker_top_n=3 - Improve document quality
- Use better embedding model
- Refine your queries
Issue: Slow search
Solutions:- Reduce
retrieval_k:retrieval_k=5 - Use smaller embedding model
- Optimize Pinecone index
- Use smaller reranker model
Issue: Out of memory
Solutions:- Reduce
retrieval_k - Reduce
reranker_top_n - Use smaller embedding model
Advanced Usage
Custom Embeddings
You can use custom embeddings (advanced):Custom Reranking
Access reranker directly:Performance Tips
- Use appropriate retrieval_k: Don’t retrieve too many documents
- Enable reranking: Always improves relevance
- Cache embeddings: Reuse embeddings when possible
- Optimize index: Use appropriate Pinecone index configuration
- Monitor performance: Track retrieval and reranking times
Next Steps
- Configuration Guide - Learn about all config options
- API Reference - Full API documentation
- Examples - See vector search in action
Questions? Check the API Reference for complete details!