Skip to main content
Streaming lets you display partial responses in real time instead of waiting for the full completion. Set stream: true in your request.

Example

from openai import OpenAI

client = OpenAI(
    base_url="https://api.inducta.ai/v1",
    api_key="your-api-key",
)

stream = client.chat.completions.create(
    model="openai/gpt-oss-120b",
    messages=[
        {"role": "user", "content": "Explain quantum computing in simple terms"}
    ],
    stream=True,
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

How it works

When streaming is enabled, the API returns a series of server-sent events (SSE). Each event contains a JSON chunk with a delta object holding the next piece of content.
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"Quantum"},"index":0}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":" computing"},"index":0}]}

...

data: [DONE]
The final data: [DONE] message signals the end of the stream. The last chunk before it includes a usage field with token counts.