Class: DocumentIndexer
Standalone document loader and indexer for Pinecone. Use this to index documents without full LangChat setup.
Constructor
DocumentIndexer(
pinecone_api_key: str,
pinecone_index_name: str,
openai_api_key: str,
embedding_model: str = "text-embedding-3-large"
)
Parameters:
Pinecone index name (must exist)
OpenAI API key for embeddings
embedding_model
str
default:"text-embedding-3-large"
Embedding model to use
Example:
from langchat.core.utils.document_indexer import DocumentIndexer
indexer = DocumentIndexer(
pinecone_api_key="pcsk-...",
pinecone_index_name="my-index",
openai_api_key="sk-...",
embedding_model="text-embedding-3-large"
)
Methods
load_and_index_documents()
Index a single document.
def load_and_index_documents(
self,
file_path: str,
chunk_size: int = 1000,
chunk_overlap: int = 200,
namespace: Optional[str] = None,
prevent_duplicates: bool = True
) -> dict
Returns:
dict with:
chunks_indexed (int): Number of chunks indexed
chunks_skipped (int): Number of duplicates skipped
load_and_index_multiple_documents()
Index multiple documents.
def load_and_index_multiple_documents(
self,
file_paths: List[str],
chunk_size: int = 1000,
chunk_overlap: int = 200,
namespace: Optional[str] = None,
prevent_duplicates: bool = True
) -> dict
Example
from langchat.core.utils.document_indexer import DocumentIndexer
indexer = DocumentIndexer(
pinecone_api_key="pcsk-...",
pinecone_index_name="my-index",
openai_api_key="sk-..."
)
# Index document
result = indexer.load_and_index_documents(
file_path="document.pdf",
chunk_size=1000,
chunk_overlap=200
)
print(f"✅ Indexed {result['chunks_indexed']} chunks")
Next Steps
Built with ❤️ by NeuroBrain