Back to cookbook

Streaming chat completion

What you'll build

A Python streaming chat client. It opens /v1/chat/completions with stream: true, parses SSE frames, surfaces partial content to a callback, handles Ctrl-C cleanly by closing the underlying connection, and reads the final usage block that arrives with the last chunk when stream_options.include_usage is set.

What you need

  • An FC API key (from /app/api-keys) with scope api:chat
  • pip install openai>=1.40
  • Python 3.10+

Full code

python
# stream_chat.py import os, signal, sys from openai import OpenAI client = OpenAI( api_key=os.environ["FC_API_KEY"], base_url="https://api.fightclub.pro/v1", ) # Graceful cancel — hit Ctrl-C once to close the HTTP stream. _current_stream = None def _handle_sigint(*_): if _current_stream is not None: _current_stream.close() sys.exit(130) signal.signal(signal.SIGINT, _handle_sigint) def stream_chat(user_prompt: str, customer: str): """Stream one assistant turn. Returns (full_text, usage_dict).""" global _current_stream buf = [] usage = None _current_stream = client.chat.completions.create( model="fc:openai/gpt-4o-mini", messages=[ {"role": "system", "content": "You are helpful and terse."}, {"role": "user", "content": user_prompt}, ], stream=True, stream_options={"include_usage": True}, extra_headers={ "FC-Customer": customer, "FC-Tag": "cli.stream_chat", }, ) for chunk in _current_stream: # Content deltas arrive until the last chunk, which carries usage. if chunk.choices and chunk.choices[0].delta.content: delta = chunk.choices[0].delta.content buf.append(delta) sys.stdout.write(delta) sys.stdout.flush() if chunk.usage is not None: usage = { "prompt": chunk.usage.prompt_tokens, "completion": chunk.usage.completion_tokens, "total": chunk.usage.total_tokens, } _current_stream = None sys.stdout.write("\n") return "".join(buf), usage if __name__ == "__main__": text, usage = stream_chat( user_prompt="Explain Ringside Client Tokens in 2 sentences.", customer="cus_demo", ) print(f"\n[usage] {usage}")

Walkthrough

The trick for a clean Ctrl-C is to keep the iterator in a module-level _current_stream handle and call .close() from the SIGINT handler. Without that, Python's stdlib prints a traceback and the HTTP connection lingers until TCP keepalive times out on Ringside's side.

stream_options={"include_usage": True} tells Ringside to emit one extra SSE frame at the end containing usage.prompt_tokens / completion_tokens / total_tokens. The usage is computed server-side from canonical token counters, trust this over any client-side tokenizer.

FC-Tag attaches a free-form label to the request. You'll see it in /app/logs and can filter by it in GET /v1/usage?group_by=tag.

Run it

bash
export FC_API_KEY=sk_live_xxx python stream_chat.py

You should see tokens stream inline, then a final [usage] {'prompt': 22, 'completion': 41, 'total': 63} line.

What's next