Main Settings
Add Memory
Add memory to your LLMs in Stack AI. Improve user interaction by enabling models to remember previous conversations and provide more context-aware responses.
LLMs do not hold an internal state, and many applications require tracking previous interactions with the LLM as part of the interface (e.g. chatbots). To this end, you can add memory to an LLM node under the Stack AI tool by clicking on the gear icon of the LLM node.
Some quick facts:
All the LLM memory is encrypted end-to-end in the Stack AI database.
This data can be self-hosted under the Stack AI enterprise plan.
The LLM memory is user-dependent and an instance of the LLM memory.
Once the deployed as an API, you can specify the user_id for the LLM memory for each user (see Deployer Guide ).
By default, the “Sliding Window Input” memory is selected when a new LLM node is added to the flow.
We offer three types of memory modalities:
Sliding Window
Stores all LLM prompts and completions.
This strategy may consume many tokens as the LLM prompts can often occupy thousands of tokens.
Loads a window of the previous prompts and completions as part of the LLM conversation memory, up to the number of messages in the window.
In non-chat models (e.g. Davinci), the memory is added as part of the prompt as a list of messages at the end of the prompt.
Sliding Window with Input
Stores one LLM input parameter (e.g. in-0) and all LLM completions, without storing the entire prompt from each turn.
This strategy is more token efficient and aligned with many applications (e.g. when only the user message from input is relevant)
Loads a window of the previous inputs and completions as part of the LLM conversation memory, up-to the number of messages in the window. In non-chat models (e.g. davinci-003-text), the memory is added as part of the prompt as a list of messages at the end.
VectorDB
Stores all of the inputs and outputs to the LLM in a Vector Database and retrieves the most relevant messages to use as LLM memory.
This is especially useful if you expect some of the information to be needed at a later time but not in a sequential manner.
Allows the LLM to access older, contextually relevant interactions without the constraint of a fixed window size.
Sliding Window
The sliding window allows you to set the number of turns you would like to be included in your context.
Input Id
If you chose to have 'Sliding Window with Input' saved in memory, then you can also select the id of the input that you would like to be held in context. All other inputs will not be stored in context.
Citations
Turn on citations to allow the AI to provide citations (references) for the information it generates, especially when it uses external sources or uploaded documents.
Response Format
Text is the default response format. You can also choose to have the AI return a response formatted as a JSON object
Text
The default option. Best for most conversational, summary, or narrative outputs.
JSON Object
Useful when you want structured data for further processing, such as extracting specific fields, integrating with APIs, or using the output in downstream nodes that expect JSON.
JSON Object with Schema
When you want to specify an exact JSON schema for the output so the AI outputs data in a very specific format for integration, automation, or validation.
Last updated
Was this helpful?