Evaluator

Evaluate your agent's performance with a LLM-as-a-judge

The Evaluator View is similar to the batch interface, in that it allows running a CSV file of inputs on your agent, all at once. This view allows testing before a project goes live, and leverages a LLM to evaluate your agent's output.

There are two types of evaluation:

1. Grading outputs based on criteria

On the right hand side, create an evaluator:

  • Select the output to evaluate

  • Add a system prompt - the evaluation logic

  • Give it a name

Once the evaluator is created, a new column will appear in the table showing the evaluation results for each row.

Add as many evaluators as outputs in your workflow. Each one will evaluate a different output. Give each evaluator's model a system prompt and select which of your agent's outputs should be evaluated.

You can manually add rows to evaluate, or upload a CSV with all your scenarios to evaluate (click the 3 dots and then the upload CSV option).

2. Comparing outputs to a gold standard answer

Click 'Requires Expected Answer' to add a ground truth to your execution. This is the response you would expect from the AI model. The evaluator will then take it into consideration for the analysis.

Last updated

Was this helpful?