Using Multiple LLMs

When using multiple LLMs in one project, there are important points to consider in order to ensure they work well together.

Explicit Connections: Each LLM node should have clearly defined input and output connections. Use Input nodes (in-0, in-1, etc.) to gather user data, and connect them to the relevant LLM nodes.
Output Handling: Route the output of each LLM node to Output nodes or downstream processing nodes (like Template or Python nodes) for further formatting or logic.

Sequential Orchestration: If the output of one LLM is needed as input for another, connect them in sequence (e.g., llm-0 → llm-1). This is useful for multi-step reasoning or refinement. Having initial LLMs give structured outputs to downstream LLMs can be helpful.
Parallel Orchestration: If you want to compare or aggregate results from multiple LLMs, connect the same input to several LLM nodes in parallel, then merge their outputs downstream using the Combine Node or a third LLM that will summarize and logically merge the two outputs

Sliding Window Memory: Use the memory feature in LLM nodes to maintain context across turns or steps, especially in multi-turn workflows.
Stateful Processing: If you need to track or update state, consider using Python nodes between LLMs to manipulate or store intermediate results.

On Failure Branches: Configure on_failure_branch and retry settings for each LLM node to handle errors gracefully.
Fallback LLMs: Use the fallback options to specify alternative models/providers if the primary LLM fails.

Template Nodes: Use Template nodes to format or merge outputs from multiple LLMs before presenting to the user.
Output Validation: If LLMs are expected to return structured data (e.g., JSON), use the json_schema parameter to enforce output format and validate results.

Integration with Actions: LLM outputs can be passed to Action nodes (e.g., sending emails, updating databases) for real-world effects.
Custom Logic: Insert Python nodes between LLMs for custom logic, filtering, or aggregation.

Citations: Enable citations in LLM nodes if you want to track sources or provide references in the output.
Auditability: Use Output nodes and logs to trace the flow of data and decisions across multiple LLMs.

Parallelization: Where possible, run LLMs in parallel to reduce overall latency.
Token and Cost Management: Set appropriate max_tokens and temperature settings to control cost and response quality.

Summary Table:

Aspect

Best Practice

Input/Output Flow

Use explicit node connections and references

Orchestration Style

Choose sequential or parallel based on use case

Prompt Engineering

Customize prompts and use context passing

Memory/State

Use memory features and Python nodes for stateful logic

Error Handling

Configure retries, fallbacks, and failure branches

Data Formatting

Use Template nodes and output validation

Chaining/Integration

Connect to Action nodes and use Python for custom logic

Citations/Traceability

Enable citations and use Output nodes for auditability

Performance

Parallelize where possible, manage tokens and latency

Last updated 17 days ago

Was this helpful?