Back to cookbook

file_search with citations (managed RAG)

What you'll build

A function that takes a customer ID and a user question, ensures the customer's vector store exists, runs the question through the OpenAI file_search tool inside an Assistants run and returns the answer text plus the source-file citations the response references. Designed to slot into a chat-feature endpoint in a customer-facing SaaS.

What you need

  • An FC API key with scope api:write
  • An Assistant with file_search-capable instructions (mint one at ringside.fightclub.pro/app/assistants with body "Answer using the supplied files. Cite the file_id and chunk index for every claim.")
  • A PDF, DOCX, transcript, anything you want to query
  • pip install openai>=1.40

Full code

python
# rag_with_citations.py import os, time from typing import TypedDict from openai import OpenAI client = OpenAI( base_url="https://api.fightclub.pro/v1", api_key=os.environ["FC_API_KEY"], ) ASSISTANT_ID = os.environ["FC_ASSISTANT_ID"] # asst_... class Citation(TypedDict): file_id: str filename: str chunk_index: int class Answer(TypedDict): text: str citations: list[Citation] def ensure_customer_store(customer_id: str, embedding_model: str = "text-embedding-3-small") -> str: """Return the vector_store_id for this customer, creating it on first call.""" # Idempotent lookup via store metadata. We tag every store with the customer id at create time. for s in client.vector_stores.list(limit=100).data: if s.metadata.get("customer_id") == customer_id: return s.id store = client.vector_stores.create( name=f"customer-{customer_id}", embedding_model=embedding_model, metadata={"customer_id": customer_id}, ) return store.id def upload_and_attach(store_id: str, local_path: str) -> str: """Upload a file and attach it to the store. Returns the file id once attached. Polls for ingest completion synchronously; in production use the vector_store.file.completed webhook instead.""" with open(local_path, "rb") as fp: file = client.files.create(file=fp, purpose="attachments") client.vector_stores.files.create(vector_store_id=store_id, file_id=file.id) while True: f = client.vector_stores.files.retrieve(vector_store_id=store_id, file_id=file.id) if f.status == "completed": return file.id if f.status in ("failed", "cancelled"): raise RuntimeError(f"ingest {f.status}: {f.last_error}") time.sleep(2) def ask_with_citations(customer_id: str, question: str) -> Answer: """Ask a question of the customer's vector store. Returns answer text plus citations.""" store_id = ensure_customer_store(customer_id) thread = client.beta.threads.create() client.beta.threads.messages.create( thread_id=thread.id, role="user", content=question, ) client.beta.threads.runs.create_and_poll( thread_id=thread.id, assistant_id=ASSISTANT_ID, tools=[{ "type": "file_search", "file_search": {"vector_store_ids": [store_id]}, }], extra_headers={"FC-Customer": customer_id}, ) messages = client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1) msg = messages.data[0] text_parts: list[str] = [] citations: list[Citation] = [] seen: set[str] = set() for block in msg.content: if block.type != "text": continue text_parts.append(block.text.value) for ann in block.text.annotations: if ann.type != "file_citation": continue fc = ann.file_citation key = f"{fc.file_id}#{fc.chunk_index}" if key in seen: continue seen.add(key) src = client.files.retrieve(fc.file_id) citations.append(Citation( file_id=fc.file_id, filename=src.filename, chunk_index=fc.chunk_index, )) return Answer(text="\n".join(text_parts), citations=citations) if __name__ == "__main__": # Demo: ingest a PDF once, then ask store_id = ensure_customer_store("cus_42") upload_and_attach(store_id, "handbook.pdf") result = ask_with_citations("cus_42", "What's the expense reporting cut-off?") print(result["text"]) print() for c in result["citations"]: print(f" [{c['filename']} chunk {c['chunk_index']}]")

What this does, line by line

ensure_customer_store keys the vector store by the customer ID through the store's metadata field. First call creates, every subsequent call reuses. This is the v1 idiom for one-store-per-customer multi-tenancy. The metadata-keyed lookup costs one list call per first-time call per customer.

upload_and_attach uploads the source file, attaches it to the store, then polls for ingest completion. For production, attach the file and return immediately; subscribe to the vector_store.file.completed webhook so your worker fires when the file is searchable.

ask_with_citations is the actual retrieval-and-answer call. The Assistants run with tools: [{type: "file_search", ...}] does the embed-query, top-K retrieval and prompt assembly internally; you give it the store id and the question. The FC-Customer header attributes the call's cost to that end-customer in your wallet (per-customer billing). The annotations array on each text block carries file_citation entries; we deduplicate by (file_id, chunk_index) so a chunk referenced twice in the same answer renders once in the UI.

Per-customer observability

Open ringside.fightclub.pro/app/vector_stores/<store_id>/queries after a few runs. You'll see every question the customer asked, the top-K scores, the latency in ms, the empty-result flag and the embedding tokens spent. When the customer says retrieval is bad, this is your debug surface.

Pricing footnote

  • Parse + embed tokens (one-off per file) at the model's published rate
  • Query tokens (per call) at the model's published rate
  • Storage: $0.10/GB-day vector index + $0.02/GB-day file storage, first 1 GB-day free per store per month
  • The LLM completion that uses the retrieved chunks bills at the Assistant model's rate

For a typical 50 MB customer handbook plus 100 queries a day, monthly storage lands at pennies per customer; query token cost is the dominant line.

Next steps