How to stream model responses
LLMs can stream completions as generated, which helps visualize tokens before the response is complete. This helps improve the user experience of those interacting with the LLM since it reduces idle time waiting for an answer.
The following LLMs support streaming out of the box:
OpenAI
Anthropic
Replicate
Inside Stack AI, you can enable streaming in your LLMs and get a streamed response every time you want to fetch a response for your interface. For that, you can use libraries like fetch-event-source to read the following endpoint:
This endpoint has the following properties:
Needs to be signed with your public API key in the authorization.
Receives a body with a JSON structure containing the value for each input. Example:
The endpoint will return error messages if the flow fails to execute. See the example below:
Last updated