Evaluator

Evaluate your agent's performance with a LLM-as-a-judge

The Evaluator View is similar to the batch interface, in that it allows running a CSV file of inputs on your agent, all at once. This view allows testing before a project goes live, and leverages a LLM to evaluate your agent's output. Two key things that can be done with this view are 1) grading outputs based on criteria 2) comparing outputs to a gold standard answer.

On the right hand side, select a model or models. Each one can perform a different function. Give each model a system prompt and select which of your agent's outputs should be evaluated.

Last updated

Was this helpful?