Skip to main content
Vector search lets your chatbot find relevant documents from your knowledge base to answer questions accurately. How it works:
  1. User asks a question
  2. Question converted to vector embedding
  3. Similar vectors found in Pinecone
  4. Relevant documents retrieved
  5. Results reranked for relevance
  6. Documents used as context for response

The Flow

User Question

Query Embedding

Vector Search (Pinecone)

Retrieve Documents

Rerank Results

Use as Context

Generate Response

Configuration

Basic Setup

from langchat.vector_db import Pinecone

vector_db = Pinecone(
    api_key="pcsk-...",
    index_name="your-index",
    embedding_model="text-embedding-3-large"  # Optional
)

Retrieval Settings

Control how many documents to retrieve:
# Access retriever directly
retriever = vector_db.get_retriever(k=10)  # Retrieve 10 documents
Recommendations:
  • Small index (< 1000 docs): k=5
  • Medium index (1000-10000 docs): k=10
  • Large index (> 10000 docs): k=20

Embedding Models

vector_db = Pinecone(
    api_key="...",
    index_name="...",
    embedding_model="text-embedding-3-large"
)
Advantages:
  • Highest quality
  • Best for semantic search
  • Good for complex queries

text-embedding-3-small

vector_db = Pinecone(
    api_key="...",
    index_name="...",
    embedding_model="text-embedding-3-small"
)
Advantages:
  • Faster and cheaper
  • Good for simple queries
  • Lower latency

Reranking

Reranking improves search result relevance:
from langchat.reranker import Flashrank

reranker = Flashrank(
    model_name="ms-marco-MiniLM-L-12-v2",
    top_n=3  # Keep top 3 after reranking
)

ai = LangChat(
    llm=llm,
    vector_db=vector_db,
    db=db,
    reranker=reranker
)
Reranking models are automatically downloaded on first use.

Best Practices

1. Balance Retrieval Count

# Too small: Might miss relevant docs
k = 3

# Too large: Too many irrelevant docs
k = 50

# Optimal: Depends on your index
k = 10  # Good for most cases

2. Always Use Reranking

Reranking improves relevance:
reranker = Flashrank(top_n=3)  # Rerank and keep top 3

3. Choose Right Embedding Model

# High quality (recommended)
embedding_model="text-embedding-3-large"

# Faster and cheaper
embedding_model="text-embedding-3-small"

Troubleshooting

No Relevant Results

Solutions:
  • Increase retrieval count: k=20
  • Check Pinecone index has documents
  • Try different embedding model

Too Many Irrelevant Results

Solutions:
  • Use reranking: top_n=3
  • Improve document quality
  • Use better embedding model
Solutions:
  • Reduce retrieval count: k=5
  • Use smaller embedding model
  • Optimize Pinecone index

Next Steps


Built with ❤️ by NeuroBrain