> ## Documentation Index
> Fetch the complete documentation index at: https://langchat.neurobrains.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Flashrank Reranker

> Cross-encoder reranker that improves search result quality.

## What it does

After Pinecone returns the top-k chunks by cosine similarity, the Flashrank reranker re-scores them using a cross-encoder model. Cross-encoders jointly encode the query and each document together, giving a much more accurate relevance score than the embedding similarity alone.

Result: fewer but better chunks reach the LLM prompt.

```
Pinecone: 5 candidates (by cosine similarity)
           ↓
Flashrank: re-scores all 5
           ↓
Top 3 most relevant chunks → LLM prompt
```

***

## Default configuration

LangChat uses Flashrank automatically — no setup required:

```python theme={null}
# This is what LangChat uses by default (you don't need to write this):
from langchat.adapters.reranker import FlashrankRerankAdapter

reranker = FlashrankRerankAdapter(
    model_name="ms-marco-MiniLM-L-12-v2",
    cache_dir="rerank_models",
    top_n=3,
)
```

***

## Custom configuration

Pass a custom reranker to `LangChat` to change the model or `top_n`:

```python theme={null}
from langchat import LangChat
from langchat.providers import OpenAI, Pinecone, Supabase
from langchat.adapters.reranker import FlashrankRerankAdapter

reranker = FlashrankRerankAdapter(
    model_name="ms-marco-MiniLM-L-12-v2",
    cache_dir="rerank_models",
    top_n=5,   # pass more chunks to the LLM
)

lc = LangChat(
    llm=OpenAI("gpt-4o-mini"),
    vector_db=Pinecone("my-index"),
    db=Supabase(),
    reranker=reranker,
)
```

***

## Parameters

<ParamField path="model_name" type="str" default="ms-marco-MiniLM-L-12-v2">
  Flashrank cross-encoder model. The default is a good balance of speed and accuracy.
</ParamField>

<ParamField path="cache_dir" type="str" default="rerank_models">
  Directory where the model is cached after first download.
</ParamField>

<ParamField path="top_n" type="int" default="3">
  Number of reranked chunks to include in the LLM prompt.
</ParamField>

***

## Model download

The Flashrank model is downloaded automatically on first use (about 100MB). It's cached locally in `cache_dir` and reused on subsequent runs.

No API key or external service is required — reranking runs entirely locally.

***

## Trade-offs

| `top_n` | LLM context quality          | Token cost | Latency |
| ------- | ---------------------------- | ---------- | ------- |
| 1–2     | Focused but may miss context | Lowest     | Fastest |
| 3       | Good balance (default)       | Moderate   | Fast    |
| 5–10    | Rich context                 | Higher     | Slower  |

For long, complex documents where multiple chunks may be needed to answer a question, increase `top_n` to 5 or more.
