Managed RAG in 15 minutes
15 minBy the end you'll have created a vector store, uploaded a PDF, waited for the ingest pipeline to parse and embed it, asked a question via the OpenAI file_search tool and pulled the citations out of the response.
- • A Ringside account (sign up at ringside.fightclub.pro/register)
- • Python 3.9+ with
openai >= 1.40installed - • A PDF you don't mind uploading. A product handbook, a research paper, anything.
- • 15 minutes
Get an API key + an assistant
// one-time setup
Mint an API key at ringside.fightclub.pro/app/api-keys and export it as FC_API_KEY. While you're in the dashboard, create an Assistant under /app/assistants with the instructions 'Answer using the supplied files. Cite the file_id and chunk index for every claim.' Copy its asst_ ID; we'll use it in Step 5.
export FC_API_KEY=fc_sk_live_... export FC_ASSISTANT_ID=asst_... pip install --upgrade openai
Create a vector store
// one tenant per customer
One vector store per customer in your app is the standard pattern. The embedding_model is locked at create time but switchable later via the dashboard's migrate flow (your re-embed runs in the background from cached parses, you pay embedding tokens only).
from openai import OpenAI
client = OpenAI(
base_url="https://api.fightclub.pro/v1",
api_key=os.environ["FC_API_KEY"],
)
store = client.vector_stores.create(
name="acme-handbook",
embedding_model="text-embedding-3-small",
)
print("store id:", store.id)
# => store id: vs_a1b2c3d4...Upload a file + attach it to the store
// async ingest starts here
Upload returns a file ID synchronously. Attaching the file to the vector store kicks off the async ingest pipeline (parse + chunk + embed + index). The attach call returns immediately with status='pending'.
with open("handbook.pdf", "rb") as fp:
file = client.files.create(file=fp, purpose="attachments")
print("file id:", file.id)
# => file id: file_xyz789...
vsf = client.vector_stores.files.create(
vector_store_id=store.id,
file_id=file.id,
)
print("vsf status:", vsf.status)
# => vsf status: pendingWait for ingest to finish
// poll, or subscribe to a webhook
For a tutorial we poll. In production, register a vector_store.file.completed webhook so your worker fires when the file is searchable. Ingest for a 30-page PDF lands in seconds. A 300-page corpus runs in a couple of minutes.
import time
while True:
f = client.vector_stores.files.retrieve(
vector_store_id=store.id,
file_id=file.id,
)
print(f" {f.status}", "" if not f.last_error else f.last_error)
if f.status in ("completed", "failed", "cancelled"):
break
time.sleep(2)
# Expected progression: pending -> in_progress -> completedAsk a question via file_search
// Assistants run with the tool config
The retrieval call is an Assistants run with the file_search tool config pointing at your store. The assistant's instructions tell the model what to do with the retrieved chunks; the run does the embed-the-query, retrieve, stuff-into-context dance for you.
thread = client.beta.threads.create()
client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="What's the company-wide expense reporting cut-off?",
)
run = client.beta.threads.runs.create_and_poll(
thread_id=thread.id,
assistant_id=os.environ["FC_ASSISTANT_ID"],
tools=[{
"type": "file_search",
"file_search": {"vector_store_ids": [store.id]},
}],
# Optional but recommended: attribute the call to the end-customer who triggered it
extra_headers={"FC-Customer": "cus_42"},
)
messages = client.beta.threads.messages.list(thread_id=thread.id, order="desc", limit=1)
answer = messages.data[0]
print(answer.content[0].text.value)Read the citations out of the response
// annotations carry file_id + chunk_index
The assistant response contains an annotations array on each text content block. Each annotation has the file_id of the source file and the chunk_index the retrieval came from. You can render these as inline citations in your UI or use them server-side for audit.
for block in answer.content:
if block.type != "text":
continue
for ann in block.text.annotations:
if ann.type == "file_citation":
fc = ann.file_citation
print(f" cited file_id={fc.file_id} chunk_index={fc.chunk_index}")
# Pull the source file's filename for a human-readable label:
src = client.files.retrieve(fc.file_id)
print(f" -> {src.filename}")A customer uploads a file, your app attaches it to that customer's vector store, your app answers questions about the file with citations. The retrieval log, per-customer cost attribution, embedding model migration and the rest of the RAG plumbing live on our side; your code is the six steps above.
Next steps
- · Citation-parsing recipe for the production-quality version of Step 6.
- · Vector stores API reference for the full endpoint list (list/get/patch/delete, file batches, cancel/retry, queries, stats, migrate, rollback).
- · RAG product page for the broader pitch and pricing.
- · RAG pricing if you want to model your monthly cost before you scale.