Handling Large Input Files
Sometimes, the files you are uploading as input to the workflow might be very large - e.g. an 800-page PDF. In this case, one single input file already exceeds the LLM's context window no matter which model you use.
We often resort to a chain of nodes to help us process the files to be within context window - Split Files tool -> Python node -> StackAI Project node.

Overview of nodes
Split Files tool splits text content from files into smaller pieces using different strategies: by character chunks, by pages, or by files.
Python node lets you write and execute custom Python code as part of your workflow. See Python Code.
StackAI Project node allows you to run (or "call") another Stack AI project from within your current workflow. See StackAI Project Node.
How this hack works
This chain of nodes accomplishes a few things:
Split Files tool splits the file into "digestible" chunks (by pages, files, or chunks/characters). It outputs a JSON object.
Python node converts the JSON object into a list format that an LLM could easily take as input.
StackAI Project node runs a subagent taking the output from the Python node and run it through a pre-selected LLM. Here you don't necessarily have to use the StackAI Project node. If your workflow is straightforward, you can use an LLM node directly here.
Output from Split Files tool
This node returns a JSON object with a chunks field.
Code for Python node
Using the below sample code, you can normalize the JSON object into a single list.
Output from Python node
Output returns a JSON array string that is easy to feed into an LLM node or sub-agent:
Advanced technique
Set this chain of nodes up as a fallback path so small files go straight to the LLM.
Recommended setup:
Start with your primary LLM node.
Turn on a fallback path using either:
the node-level “On Error” fallback branch (good when the failure mode is “context exceeded”), or
an explicit router like If/Else Node (good when you can predict size).
In the fallback path, run:
Split Files tool → Python node → StackAI Project node (or another LLM node).
If you’re using “On Error”, pair it with Fallback & Error Handling settings like Retry on Failure and LLM Fallback Mode.
Tips & best practices
1. Test early with small inputs and pin nodes Before running large documents, validate the workflow using small files or by pinning nodes. This makes debugging easier and helps catch parsing or context-limit issues early.
2. Choose the right LLM for the job LLMs differ in context window size and how well they reason over long lists of chunks. Select the model based on total token volume and whether cross-chunk synthesis is required.
3. Optimize paths by file type Spreadsheets and documents behave differently. Spreadsheets often benefit from row- or sheet-based processing, while Word/PDF files work best with page- or chunk-based splitting. In mixed workflows, consider branching early by file type.
Last updated
Was this helpful?

