Overview
DocumentIndexer is the underlying class that powers LangChat.index(). Use it directly only when you need to index documents outside of a LangChat instance — for example, in a standalone indexing script that doesn’t start the full chatbot.
For most use cases, use lc.index() instead.
Constructor
Pinecone API key.
Pinecone index name.
OpenAI API key for creating embeddings.
OpenAI embedding model.
Methods
load_and_index_documents()
Index a single file.
Path to the document file.
Characters per chunk.
Overlap between adjacent chunks.
Pinecone namespace.
Skip chunks already in Pinecone (checked by content hash).
dict with chunks_indexed, chunks_skipped, and metadata.
load_and_index_multiple_documents()
Index multiple files.
load_and_index_documents(), but accepts a list of file paths.
Standalone indexing script
UseDocumentIndexer directly when you want to index documents independently of the chatbot:
LangChat.index() is a convenience wrapper around DocumentIndexer that reads credentials from environment variables automatically. Prefer it when you already have a LangChat instance.