Handling Large Input Files

Sometimes, the files you are uploading as input to the workflow might be very large - e.g. an 800-page PDF. In this case, one single input file already exceeds the LLM's context window no matter which model you use.

We often resort to a chain of nodes to help us process the files to be within context window - Split Files tool -> Python node -> StackAI Project node.

Overview of nodes

Split Files tool splits text content from files into smaller pieces using different strategies: by character chunks, by pages, or by files.
Python node lets you write and execute custom Python code as part of your workflow. See Python Code.
StackAI Project node allows you to run (or "call") another Stack AI project from within your current workflow. See StackAI Project Node.

How this hack works

This chain of nodes accomplishes a few things:

Split Files tool splits the file into "digestible" chunks (by pages, files, or chunks/characters). It outputs a JSON object.
Python node converts the JSON object into a list format that an LLM could easily take as input.
StackAI Project node runs a subagent taking the output from the Python node and run it through a pre-selected LLM. Here you don't necessarily have to use the StackAI Project node. If your workflow is straightforward, you can use an LLM node directly here.

Output from Split Files tool

This node returns a JSON object with a chunks field.

{
  "chunks": [
    "Chunk 1 text…",
    "Chunk 2 text…",
    "Chunk 3 text…"
  ]
}

Code for Python node

Using the below sample code, you can normalize the JSON object into a single list.

def extract_chunks(data: typing.Any) -> typing.List[typing.Any]:
    """
    Return the inner list from {'chunks': [...]}.
    Accepts:
      - dict: {'chunks': [...]}
      - str: JSON or Python-literal form of the above
      - list: returns as-is
    """
    if isinstance(data, list):
        return data
    if isinstance(data, dict):
        return data["chunks"]  # raises KeyError if missing

    if isinstance(data, str):
        s = data.strip()
        # Try JSON first, then Python literal
        try:
            obj = json.loads(s)
        except json.JSONDecodeError:
            obj = ast.literal_eval(s)

        if isinstance(obj, list):
            return obj
        if isinstance(obj, dict) and "chunks" in obj:
            return obj["chunks"]

#result = extract_chunks({'chunks': ['a', 'b']})
result = extract_chunks(action_0)
return json.dumps(result)    # ["a", "b"]

Output from Python node

Output returns a JSON array string that is easy to feed into an LLM node or sub-agent:

[
  "Chunk 1 text…",
  "Chunk 2 text…",
  "Chunk 3 text…"
]

Advanced technique

Set this chain of nodes up as a fallback path so small files go straight to the LLM.

Recommended setup:

Start with your primary LLM node.
Turn on a fallback path using either:
- the node-level “On Error” fallback branch (good when the failure mode is “context exceeded”), or
- an explicit router like If/Else Node (good when you can predict size).
In the fallback path, run:
- Split Files tool → Python node → StackAI Project node (or another LLM node).

If you’re using “On Error”, pair it with Fallback & Error Handling settings like Retry on Failure and LLM Fallback Mode.

Tips & best practices

1. Test early with small inputs and pin nodes Before running large documents, validate the workflow using small files or by pinning nodes. This makes debugging easier and helps catch parsing or context-limit issues early.

2. Choose the right LLM for the job LLMs differ in context window size and how well they reason over long lists of chunks. Select the model based on total token volume and whether cross-chunk synthesis is required.

3. Optimize paths by file type Spreadsheets and documents behave differently. Spreadsheets often benefit from row- or sheet-based processing, while Word/PDF files work best with page- or chunk-based splitting. In mixed workflows, consider branching early by file type.

PreviousFallback & Error Handling NextGet Business Insights From Your Workflow

Last updated 20 days ago

Was this helpful?

hashtagOverview of nodes

hashtagHow this hack works

hashtagOutput from Split Files tool

hashtagCode for Python node

hashtagOutput from Python node

hashtagAdvanced technique

hashtagTips & best practices