Documentation Index
Fetch the complete documentation index at: https://docs.inducta.ai/llms.txt
Use this file to discover all available pages before exploring further.
Streaming lets you display partial responses in real time instead of waiting for the full completion. Set stream: true in your request.
Example
from openai import OpenAI
client = OpenAI(
base_url="https://api.inducta.ai/v1",
api_key="your-api-key",
)
stream = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[
{"role": "user", "content": "Explain quantum computing in simple terms"}
],
stream=True,
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)
How it works
When streaming is enabled, the API returns a series of server-sent events (SSE). Each event contains a JSON chunk with a delta object holding the next piece of content.
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":"Quantum"},"index":0}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"delta":{"content":" computing"},"index":0}]}
...
data: [DONE]
The final data: [DONE] message signals the end of the stream. The last chunk before it includes a usage field with token counts.