How to stream model responses

PreviousAllow users to upload files NextManage User Chats

Last updated 1 year ago

How to stream model responses

LLMs can stream completions as generated, which helps visualize tokens before the response is complete. This helps improve the user experience of those interacting with the LLM since it reduces idle time waiting for an answer.

The following LLMs support streaming out of the box:

OpenAI
Anthropic
Replicate

Inside Stack AI, you can enable streaming in your LLMs and get a streamed response every time you want to fetch a response for your interface. For that, you can use libraries like to read the following endpoint:

https://www.stack-inference.com/stream_exported_flow?flow_id='YOUR FLOW ID'&org='YOUR Organization'

This endpoint has the following properties:

Needs to be signed with your public API key in the authorization.
Receives a body with a JSON structure containing the value for each input. Example:

body = {'in-0': '<Value of input 0', 'in-1': ..., 'in-2': ...}

The endpoint will return error messages if the flow fails to execute. See the example below:

import { fetchEventSource } from '@microsoft/fetch-event-source';

const streamDataFromStack = () => {
    const inputsAPI = { 'in-0': 'value_0', 'in-1': 'value_1', user_id: user_id }
    var outputs = null

    await fetchEventSource(`https://www.stack-inference.com/stream_exported_flow?flow_id=${flow_id}&org=${org}`, {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            "Authorization": `Bearer ${PUBLIC_API_KEY}`,
            "Accept": "text/event-stream, text/plain",
        },
        body: JSON.stringify(inputsAPI),
        signal: signal,
        openWhenHidden: true,
        async onopen(res) {
            if (res.ok && res.status === 200) {
            }
            else if (
                res.status >= 400 &&
                res.status < 500 &&
                res.status !== 429
            ) {
                console.error("Client side error ", res);
            }
        },
        onmessage(event) {
            outputs = JSON.parse(event.data);
            if (data.error) {
                controller.current.abort();
                return;
            }
        },
        onclose() {
        },
        onerror(err) {
            throw err;    
        },
    }).catch((err) => {
        console.error("There was an error from server", err);
    }
}

PreviousAllow users to upload files NextManage User Chats

Last updated 1 year ago

The following LLMs support streaming out of the box:

OpenAI
Anthropic
Replicate

https://www.stack-inference.com/stream_exported_flow?flow_id='YOUR FLOW ID'&org='YOUR Organization'

This endpoint has the following properties:

Needs to be signed with your public API key in the authorization.
Receives a body with a JSON structure containing the value for each input. Example:

body = {'in-0': '<Value of input 0', 'in-1': ..., 'in-2': ...}

The endpoint will return error messages if the flow fails to execute. See the example below:

import { fetchEventSource } from '@microsoft/fetch-event-source';

const streamDataFromStack = () => {
    const inputsAPI = { 'in-0': 'value_0', 'in-1': 'value_1', user_id: user_id }
    var outputs = null

    await fetchEventSource(`https://www.stack-inference.com/stream_exported_flow?flow_id=${flow_id}&org=${org}`, {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            "Authorization": `Bearer ${PUBLIC_API_KEY}`,
            "Accept": "text/event-stream, text/plain",
        },
        body: JSON.stringify(inputsAPI),
        signal: signal,
        openWhenHidden: true,
        async onopen(res) {
            if (res.ok && res.status === 200) {
            }
            else if (
                res.status >= 400 &&
                res.status < 500 &&
                res.status !== 429
            ) {
                console.error("Client side error ", res);
            }
        },
        onmessage(event) {
            outputs = JSON.parse(event.data);
            if (data.error) {
                controller.current.abort();
                return;
            }
        },
        onclose() {
        },
        onerror(err) {
            throw err;    
        },
    }).catch((err) => {
        console.error("There was an error from server", err);
    }
}