RAG products

Your prototype answered questions about a handbook beautifully. Then a customer uploaded what they actually have: a contract scanned at four degrees off straight, a slide deck where every word is baked into a JPEG, a spreadsheet with merged header cells and an hour of a recorded call. Retrieval quality fell off a cliff and nobody changed the embedding model.

This page is for the engineer shipping document Q&A, knowledge search or support-over-a-manual as a customer-facing feature, who has just discovered which part of RAG is the hard part.

Retrieval quality is mostly a parsing problem

Follow the chain backwards from a bad answer. The model was given chunks. The chunks came from text. The text came from a parser looking at a file it may not have understood, and if that parser produced a page of ligature soup or skipped the image entirely, then everything downstream was working perfectly on garbage. The embedding model never sees the document. It sees whatever the parser decided the document said.

Which is why RAG demos outperform RAG products so reliably. Demo corpora are clean, born-digital and hand-picked. Customer corpora are whatever was on the shared drive. Ringside puts the effort where the quality actually comes from: 20+ file types, PDFs, Word, slides, images with text in them, audio for transcription, CSV tables turned into row narratives. Parse cost is billed in tokens for the work the parser actually ran, so a one-page memo is not charged like a 400-page deposition.

The part you find out about in year two

A better embedding model lands. Yours is now second-rate, and switching means re-embedding every customer's corpus, which means re-parsing every file, which means either downtime or a migration script you get one shot at. Most teams simply do not switch, and their retrieval quietly stays at 2026 levels forever.

Ringside keeps the parses cached, so a model change re-embeds from those in the background and swaps the index atomically per tenant, with a 7-day rollback window if the new model turns out worse on your corpus. Nine embedding models to choose from across OpenAI, Cohere, Voyage, Mistral and Google. The wire format stays OpenAI's either way.

The rest of it

• One store per customer. OpenAI-compat /v1/vector_stores with real tenant isolation, and a cross-tenant read answers 404 rather than confirming the store exists.
• file_search where you already are. Drop the tool config into any Assistants run and citations come back in the response.
• A query log per store. Top-K scores, p95 latency, empty-result rate and daily embedding cost. When a customer tells you retrieval is bad, this is the difference between agreeing with them and showing them that the last 40 queries returned nothing because the corpus does not cover what they are asking.
• Attribution built in. Pass FC-Customer and parse, embedding and retrieval cost all land against that customer.

If you turn on encryption

Sealed stores are a different ingest path, and the tradeoff is real. A store created with managed or byok encryption keeps its key in the web tier so that chunk text never reaches the worker in the clear, which also means it does not reach the worker's parsers. Today a sealed store takes text-like content only, up to 25 MB per file. PDFs, Office documents and images are not ingested into a sealed store yet. Plan for plaintext stores where you need the full parser range, and sealed stores where the content is text and the sealing is the point.

Architecture

In code

# 1. One vector store per customer.
store = client.vector_stores.create(
    name=f"acme-{customer_id}",
    embedding_model="text-embedding-3-small",
)

# 2. Upload a file. 20+ types: pdf, docx, pptx, png, mp3, csv, ...
file = client.files.create(
    file=open("handbook.pdf","rb"),
    purpose="attachments",
)
client.vector_stores.files.create(
    vector_store_id=store.id,
    file_id=file.id,
)

# 3. Query it with file_search inside an Assistants run.
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread_id,
    assistant_id=assistant_id,
    tools=[{
        "type": "file_search",
        "file_search": {"vector_store_ids": [store.id]},
    }],
    extra_headers={"FC-Customer": customer_id},
)
# Parsing, chunking, embedding, retrieval, citation extraction and
# per-customer cost attribution all happened above this line.

Cross-links

Deeper

Endpoints

Read the RAG page →