Your customer uploads their handbook and asks it questions, and every sentence of the answer comes back with the file and chunk it was taken from, so nobody has to trust the model's word for it. Each customer's documents live in their own store, so tenant A's contracts are never retrievable by tenant B.

One function, from customer id and question to answer plus citations. Drop it behind a chat endpoint.

What you need

An FC API key with scope api:write
An Assistant with file_search-capable instructions (mint one at ringside.fightclub.pro/app/assistants with body "Answer using the supplied files. Cite the file_id and chunk index for every claim.")
A PDF, DOCX, transcript, anything you want to query
pip install openai>=1.40

Full code

python
# rag_with_citations.py
import os, time
from typing import TypedDict
from openai import OpenAI

client = OpenAI(
    base_url="https://api.fightclub.pro/v1",
    api_key=os.environ["FC_API_KEY"],
)
ASSISTANT_ID = os.environ["FC_ASSISTANT_ID"]  # asst_...


class Citation(TypedDict):
    file_id: str
    filename: str
    chunk_index: int


class Answer(TypedDict):
    text: str
    citations: list[Citation]


def ensure_customer_store(customer_id: str, embedding_model: str = "text-embedding-3-small") -> str:
    """Return the vector_store_id for this customer, creating it on first call."""
    # Idempotent lookup via store metadata. We tag every store with the customer id at create time.
    for s in client.vector_stores.list(limit=100).data:
        if s.metadata.get("customer_id") == customer_id:
            return s.id
    store = client.vector_stores.create(
        name=f"customer-{customer_id}",
        embedding_model=embedding_model,
        metadata={"customer_id": customer_id},
    )
    return store.id


def upload_and_attach(store_id: str, local_path: str) -> str:
    """Upload a file and attach it to the store. Returns the file id once attached.
    Polls for ingest completion synchronously; in production use the
    vector_store.file.completed webhook instead."""
    with open(local_path, "rb") as fp:
        file = client.files.create(file=fp, purpose="attachments")
    client.vector_stores.files.create(vector_store_id=store_id, file_id=file.id)
    while True:
        f = client.vector_stores.files.retrieve(vector_store_id=store_id, file_id=file.id)
        if f.status == "completed":
            return file.id
        if f.status in ("failed", "cancelled"):
            raise RuntimeError(f"ingest {f.status}: {f.last_error}")
        time.sleep(2)


def ask_with_citations(customer_id: str, question: str) -> Answer:
    """Ask a question of the customer's vector store. Returns answer text plus citations."""
    store_id = ensure_customer_store(customer_id)

    thread = client.beta.threads.create()
    client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=question,
    )
    client.beta.threads.runs.create_and_poll(
        thread_id=thread.id,
        assistant_id=ASSISTANT_ID,
        tools=[{
            "type": "file_search",
            "file_search": {"vector_store_ids": [store_id]},
        }],
        extra_headers={"FC-Customer": customer_id},
    )

    messages = client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
    msg = messages.data[0]

    text_parts: list[str] = []
    citations: list[Citation] = []
    seen: set[str] = set()
    for block in msg.content:
        if block.type != "text":
            continue
        text_parts.append(block.text.value)
        for ann in block.text.annotations:
            if ann.type != "file_citation":
                continue
            fc = ann.file_citation
            key = f"{fc.file_id}#{fc.chunk_index}"
            if key in seen:
                continue
            seen.add(key)
            src = client.files.retrieve(fc.file_id)
            citations.append(Citation(
                file_id=fc.file_id,
                filename=src.filename,
                chunk_index=fc.chunk_index,
            ))

    return Answer(text="\n".join(text_parts), citations=citations)


if __name__ == "__main__":
    # Demo: ingest a PDF once, then ask
    store_id = ensure_customer_store("cus_42")
    upload_and_attach(store_id, "handbook.pdf")
    result = ask_with_citations("cus_42", "What's the expense reporting cut-off?")
    print(result["text"])
    print()
    for c in result["citations"]:
        print(f"  [{c['filename']} chunk {c['chunk_index']}]")

What this does, line by line

ensure_customer_store keys the vector store by the customer ID through the store's metadata field. First call creates, every subsequent call reuses. This is the v1 idiom for one-store-per-customer multi-tenancy. The metadata-keyed lookup costs one list call per first-time call per customer.

upload_and_attach uploads the source file, attaches it to the store, then polls for ingest completion. For production, attach the file and return immediately; subscribe to the vector_store.file.completed webhook so your worker fires when the file is searchable.

The store above is plaintext, which is what lets it take a PDF. A sealed store (encryption: "managed" or "byok") accepts text only today, so text, Markdown, JSON, XML and YAML up to 25 MB. Sealing a store and then pushing a PDF at it gives you a failed file.

ask_with_citations is the actual retrieval-and-answer call. The Assistants run with tools: [{type: "file_search", ...}] does the embed-query, top-K retrieval and prompt assembly internally; you give it the store id and the question. The FC-Customer header attributes the call's cost to that end-customer in your wallet (per-customer billing). The annotations array on each text block carries file_citation entries; we deduplicate by (file_id, chunk_index) so a chunk referenced twice in the same answer renders once in the UI.

Per-customer observability

Open ringside.fightclub.pro/app/vector_stores/<store_id>/queries after a few runs. You'll see every question the customer asked, the top-K scores, the latency in ms, the empty-result flag and the embedding tokens spent. When the customer says retrieval is bad, this is your debug surface.

Pricing footnote

Four things bill. Parse and embed tokens once per file, query tokens per call, storage per GB-day and the LLM completion that reads the retrieved chunks. Storage is metered daily on the vector index and on the raw file, with the graph index charged separately if you turn GraphRAG on; the first 1 GB-day per store per day is free on the vector index and on the graph. Current rates are on the pricing page, rendered live, so don't hardcode them.

For a 50 MB handbook and 100 queries a day, storage lands in pennies per customer per month. Query tokens dominate the bill.

Next steps

Managed RAG tutorial for the step-by-step version of this code
Vector stores docs section for the full endpoint list (list, get, delete, migrate, queries)
RAG product page for the broader pitch
Pricing breakdown before you scale