Optimization Strategies
Reduce Retrieval Count
Retrieve fewer documents for faster responses:Use Faster Models
Use faster (and cheaper) models:Optimize Reranking
Use smaller reranker models:Reduce Chat History
Limit chat history in memory:Caching
Cache Embeddings
Cache frequently used embeddings:Cache Reranked Results
Cache reranked results:Database Optimization
Use Indexes
Create indexes for faster queries:Batch Operations
Process in batches:Monitoring
Track Response Times
Monitor response times:Monitor API Usage
Track API key usage:Cost Optimization
Use Efficient Models
Choose models based on needs:Reduce Token Usage
Limit context length:Related Documentation
- Configuration - Configuration options
- Troubleshooting - Common issues