Skip to main content

Class: DocumentIndexer

Standalone document loader and indexer for Pinecone. Use this to index documents without full LangChat setup.

Constructor

DocumentIndexer(
    pinecone_api_key: str,
    pinecone_index_name: str,
    openai_api_key: str,
    embedding_model: str = "text-embedding-3-large"
)
Parameters:
pinecone_api_key
str
required
Pinecone API key
pinecone_index_name
str
required
Pinecone index name (must exist)
openai_api_key
str
required
OpenAI API key for embeddings
embedding_model
str
default:"text-embedding-3-large"
Embedding model to use
Example:
from langchat.core.utils.document_indexer import DocumentIndexer

indexer = DocumentIndexer(
    pinecone_api_key="pcsk-...",
    pinecone_index_name="my-index",
    openai_api_key="sk-...",
    embedding_model="text-embedding-3-large"
)

Methods

load_and_index_documents()

Index a single document.
def load_and_index_documents(
    self,
    file_path: str,
    chunk_size: int = 1000,
    chunk_overlap: int = 200,
    namespace: Optional[str] = None,
    prevent_duplicates: bool = True
) -> dict
Returns: dict with:
  • chunks_indexed (int): Number of chunks indexed
  • chunks_skipped (int): Number of duplicates skipped

load_and_index_multiple_documents()

Index multiple documents.
def load_and_index_multiple_documents(
    self,
    file_paths: List[str],
    chunk_size: int = 1000,
    chunk_overlap: int = 200,
    namespace: Optional[str] = None,
    prevent_duplicates: bool = True
) -> dict

Example

from langchat.core.utils.document_indexer import DocumentIndexer

indexer = DocumentIndexer(
    pinecone_api_key="pcsk-...",
    pinecone_index_name="my-index",
    openai_api_key="sk-..."
)

# Index document
result = indexer.load_and_index_documents(
    file_path="document.pdf",
    chunk_size=1000,
    chunk_overlap=200
)

print(f"✅ Indexed {result['chunks_indexed']} chunks")

Next Steps


Built with ❤️ by NeuroBrain